


-~ LAP He ® SIE § 











Journal of Experimental Education 








Volume VII 


MARCH, 1939 


Number 3 








FACTORS UNDERLYING UNPREDICTED SCHOLASTIC 
ACHIEVEMENT OF COLLEGE FRESHMEN 


WALTER R. HEPNER 
San Diego State College 


CHAPTER I 
THE PROBLEM 


In recent years much attention has been 
directed to methods of predicting the quality 
of a student’s college work in advance of 
college entrance. Investigations show that 
satisfactory adjustments to college, as meas- 
ured in terms of instructors’ grades, is the 
resultant of many factors, most of which fall 
into two general groups. The first group com- 
prises the individual’s background: his intel- 
lectual endowment, his social and emotional 
adjustments, his habits and methods of work, 
his achievements in high school, his special 
interests and abilities, and his system of 
values;—all those phases of growth and 
development that have affected his personal- 
ity before he became a college student. The 
second group includes those influences which 
are brought to bear on the student after col- 
lege enrollment: the traditions of the campus, 
the new freedom, the program of extra- 
curricular activities, the quality of college 
instruction, the adaptability of the curriculum 
to the student’s individual needs, the avail- 
able guidance service, as well as the equip- 
ment and management of classrooms, libraries, 
laboratories and health clinics. 

When the freshman stands on the threshold 
of his college experience, what equipment 
does he bring with him that will suggest a 
basis for predicting his academic achieve- 
ment? Thus far the most frequent criteria 
used in predicting college success are high 
school scholarship and college aptitude tests: 
in general, those factors which form the col- 
lege student’s background. However, the 
predictions derived from these sources are not 
completely reliable. Not by any means, for 
example, are all the brilliant high school stu- 
dents with high rankings on the college apti- 
tude test highly successful in college. Appar- 
ently academic adjustment in college is 


conditioned by forces and patternings of 
qualities as indefinite, varied, and complex as 
life itself. 

This investigation is concerned with some 
of those elements in the student’s background 
that are related to his variation from a 
standard of academic achievement in college 
predicted for him on the basis of his aptitude 
and his scholastic achievement in high school. 


I. Spectric STATEMENT OF THE PROBLEM 


Statement of the Problem. Specifically, the 
problem of this investigation is to answer the 
following question: 

What are some of the factors underlying 
the unpredicted academic achievement of 
college freshmen? 


II. DELIMITATIONS 


Ordinarily when a_ student’s academic 
adjustment is under consideration reference 
is made to his grade point average as low, 
average, or high as compared with a standard 
that has been adopted arbitrarily for the 
institution. Academic adjustment for the 
purpose of this investigation is defined in 
terms of the comparability of the student’s 
achievement as measured by instructors’ 
grades received during his freshman year, 
with the achievement indicated for him by a 
prediction equation. This equation has been 
derived from the scholarship records for the 
student’s last three years in high school,’ 
percentile rank on the American Council on 
Education Psychological Examination for Col- 
lege Freshmen, and the freshman scholarship 
records of a large sampling of students who 
entered in September and completed one con- 
tinuous year of work in San Diego State Col- 
lege during either 1934-1935 or 1935-1936. 

A delimitation of this investigation is also 
recognized in the use of school marks as in- 


1The majority of freshmen in the population studied 
come from senior high schools. Hence transcripts do not 
include the record of work taken in the ninth grade. 


159 











160 JOURNAL OF EXPERIMENTAL EDUCATION 


struments of measurement. Frasier and Heil- 
man’? in their calculations of correlations 
between the Thorndike Intelligence Examina- 
tion and college achievement found an average 
coeffient of correlation of 0.45, wiih a range 
from 0.24 to 0.57 when instructors’ subjective 
marks were used, and a coefficient of 0.60 
and a range from 0.46 to 0.69 when objective 
achievement examinations were administered. 
The author, in the absence of objective meas- 
urements, makes use of instructors’ marks 
with the point of view as expressed by Toops 
and Kuder, who state: 


Admittedly, grades are a sorry measure 
of the success with which an individual 
meets the college situation. Yet these have 
been used universally as criteria for lack 
of anything better. Grades are a hodge- 
podge of many characteristics of the 
individual, the instructor, the course or 
courses taken, and the situation.® 


This investigation is further delimited to a 
consideration of those background character- 
istics, adjustments, and achievements of stu- 
dents as measured by the scholarship grade 
point average for the last three years of high 
school and for the first year of college;* 
a vocational questionnaire; the American 
Council on Education Psychological Examina- 
tion for College Freshmen, 1931, 1934, and 
1936 editions; the Bell Adjustment In- 
ventory; the Sones—Harry High School 
Achievement Tests, Form A; the Shank Tests 
of Reading Comprehension, Test III, Form C; 
the Progressive Mathematics Test—Ad- 
vanced, Form A, Tests III and IV; and the 
Barrett—Ryan English Test, Form XII. 


The population of the groups upon which 
this investigation was centered was limited 
to students who had been in full time attend- 
ance in the San Diego State College. 


While it was recognized that sex differences 
operate to produce differences in college 


* George W. Frasier and J. D. Heilman, “Experiments in 
Teacher—College Administration, III: Intelligence Tests.” 
ne Administration and Supervision, XIV (April, 
1928), 276. 


* Herbert A. Toops and G. Frederick Kuder, “Psychologi- 
cal Toe Review of Educational Research, V, No. 3 (June, 
1935), 217. 


*While it is recognized that a grade point average obscures 
a student's success in certain subjects, it is generally used 
in institutions of higher education as the criterion of 
scholastic success. § discussion of its application in David 


Segel, Prediction of Success in College (Office of Education 
Bulletin No. 15; Washington, D. C.: Office of Education, 
1934), pp. 52-56. 


[ Vol. 7) No. 3 


achievement between men and women,’ no 
segregation of the sexes was made in this 
study. This procedure was necessary because 
the number of subjects for each group would 
have been too small for statistical analysis. 


The experimental group was limited ty 
students who had completed their senior high 
school course in the usual three year period, 
who transferred directly to college, and who 
enrolled in twelve or more units of college 
work each semester during the freshman 
year.” This delimitation was admitted in 
order that the findings might be based on 
data concerning typical college freshman 
students. 


CHAPTER II 


MATERIALS, SUBJECTS, AND STATIs.- 
TICAL PROCEDURE USED IN 
THE INVESTIGATION 


I. INTRODUCTION 


For the purpose of this investigation only 
those selected measures were used that had 
been previously developed by other workers. 
The limitations of such measures are fully 
recognized. Likewise, established statistical 
procedures were employed throughout. In 
this chapter will be given a description of the 
sources and nature of the data and of the 
statistical procedures that were used in this 
investigation. 


II. MATERIAL 


Source and Type of Data. The original 
data were obtained from the office of the 
Registrar of the San Diego State College. 
These consisted of (1) the scholastic records 
of the experimental population in their last 
three years in high school; (2) the scholastic 
records of the experimental groups for their 
first year of college work; a vocational 
questionnaire answered by each subject at 
time of registration; the percentile scores on 
the American Council on Education Psycho- 


® Wagner reported that boys who deviated negatively from 
predicted college grades exceeded the number of positive 
deviates, whereas the ite was true for the girls. Ser 
Mazie Earle Wagner, “Studies in Academic Motivation. 
Studies in Articulation of High School x College (Univer- 
sity of Buffalo Studies, XIII; Buffalo, N. Y.: University of 
Buffalo, 1936), p. 192. 

*While neither the completion of the senior high schoo! 
course in less than three years nor the taking of an addi- 
tional post graduate year have been proved by investigator: 
to influence college achievement, it has been shown that those 
students who go to oie after nding some time outside 
of school are remarkably successful in college considering 
their high school achievement. Wagner, of. cit.. p. 222. 








ee ee 

















March, 1939] 


logical Examination for College Freshmen, 
1930, 1931, and 1934 editions for the pre- 
diction group, and the 1936 edition for the 
experimental group; (3) Scores on the Bell 
Adjustment Inventory—and scores on each of 
four achievement examinations, namely, the 


a. Barrett-Ryan English Test, Form XII. 

b. Progressive Mathematics Tests—Ad- 
vanced Form A, Test 3—-Mathematical 
Reasoning, and Test 4—Mathematical 
Fundamentals. 

c. Shank Tests of Reading Comprehension, 
Test III, Form C. 

d. Sones—Harry High School Achievement 
Test, Form A. 


All of these tests and the vocational ques- 
tionnaire had been administered in accordance 
with standard procedures during the regular 
activities of registration at the beginning of 
the school year in September, 1936. 


Selection of Material. These materials 
were selected both because of their avail- 
ability and their practicability. For several 
years the Committee on Tests and Measure- 
ments of the Faculty of the San Diego State 
College had been endeavoring to select a list 
of standardized tests which could be readily 
administered to all freshmen with a minimum 
of effort, expense, and with a maximum of 
value to administrators and personnel workers. 
From the evaluation of the experience gained 
in administering and using the data of vari- 
ous tests, and from the judgments made con- 
cerning available tests, the committee and 
administration adopted the list as presented 
above. Thus, the validity of the judgments 
made in the original selection and adoption 
of the tests was assumed. And, furthermore, 
it was assumed that the potential practical 
values of the investigation would be enhanced 
if materials already incorporated in the 
administrative and guidance procedures of 
the institution were used. At the same time, 
it was recognized that this assumption would 
necessarily delimit the study to the measures 
already in use, as well as to the validity and 
reliability of those measures. This procedure 
was considered justifiable on the grounds that 
the investigator in the field selected must 
necessarily use those tools which have already 
been developed. 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 161 


III. THe SuBjectTs 


Experimental Populations. Since one of 
the objectives of the study was to acquire 
information and insights that would be useful 
in directing changes in administrative, per- 
sonnel, and curricular practices in the San 
Diego State College, the experimental popu- 
lations were selected entirely from that insti- 
tution. Two different groups were involved, 
namely, the experimental group, and the 
group used for the purpose of developing the 
prediction formula which was to be applied 
to the experimental group. 

Experimental Group. The experimental 
group consisted of al! beginning freshmen 
who enrolled in the college for twelve or more 
units of work in September, 1936, and who 
had completed a year’s work by June, 1937, 
with a minimum of twelve units each 
semester. The group was further limited to 
those students who had completed their 
regular senior high school course in three 
years, had not taken post graduate work in 
high school, and who transferred directly from 
high school to college. The group selected in 
this way had a membership of three hundred 
and eighty-two. 

Prediction Group. The group used for the 
development of the prediction formula con- 
sisted of a random sample of six hundred 
members of the freshman classes of September, 
1934 and September, 1935. The criteria ap- 
plied in the selection of members of the experi- 
mental group were also used for this group. 
However, instead of taking the whole of one 
year’s freshman class alone or the total of 
both years’ classes, the first three hundred 
names on an alphabetically arranged list for 
each class were selected. By this method a 
more representative sampling of subjects for 
use in prediction in the institution was pro- 
vided than if the group had been limited to 
a single year’s class. Three hundred mem- 
bers of each of the two classes were used 
because that was the maximum number that 
could be secured for the class. of 1934. 


IV. STATISTICAL PROCEDURES 


A. Scores Used 


High School and College Scholarship Aver- 
ages. Most of the high school transcripts of 
record employed the same five-point marking 
system as that used in the college, namely, 
A, B, C, D, and F. In every case transcripts 














162 


which carried other types of marking symbols 
supplied a transmutation table which was 
applied in order to express grades in symbols 
on the letter scale. For statistical use these 
marks were assigned the numerical weightings 
of 4, 3, 2, 1, and o, respectively. Numerical 
weightings for college marks were 3, 2, I, 0, 
and -1 for grades A, B, C, D, and F, respec- 
tively. After the assignment of weightings 
the grade point ratios or averages for scholar- 
ship in both high school and college were 
figured for each subject by dividing the total 
grade points earned by the number of units 
attempted. All weightings and calculations of 
grade point averages were made by the 
regularly employed personnel of the Regis- 
trar’s office, supplemented by trained assist- 
ants. All work was checked for accuracy by 
a trained statistician. 

Psychological Examination Scores. Per- 
centile ranks were used instead of raw scores 
on the five separate tests of the American 
Council on Education Psychological Examin- 
ation for College Freshmen. This procedure 
was followed because it made possible the 
inclusion of scores on three different editions 
of the examination in one distribution which 
was necessary when developing the prediction 
formula. Since this prediction formula was 
based in part upon percentile ranks it is 
obvious that the scores on the 1936 edition 
with which the formula was appliet! must 
likewise be expressed in percentiles. 

The Bell Adjustment Inventory Scores. 
Raw scores on each of the four parts of the 
test were recorded, namely, Home Adjust- 
ment, Health Adjustment, Social Adjustment, 
and Emotional Adjustment. 

Achievement Test Scores. The raw scores 
on the Barrett-Ryan English Test, and on 
the Shank Tests of Reading Comprehension 
were used. Gross scores were recorded for 
Test 3—Mathematical Reasoning and Test 4 
—Mathematical Fundamentals, of the Pro- 
gressive Mathematics Tests—Advanced Form 
A, and for the total and the four sections of 
the Sones—Harry High School Achievement 


Test, namely, Language and Literature, 
Mathematics, Natural Science, and Social 
Science. 


1 Different weightings were used for the high school grades 
to avoid the extra effort involved in using negative numbers 
in calculation. Additional time was a by applying the 
weighting involving a use of the negative number in college 
grades because these had been determined by the Registrar’s 
office in accordance with customary . This 
was justified on the ground that a different system of weight- 
ings had no effect upon the accuracy of the statistical 
treatment. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 7> No. 3 


B. Tabulation of the Data 


For Computing Regression Equation (the 
Prediction Formula). Data from the Regis- 
trar’s records were entered on three by five 
cards for each of the six hundred cases. After 
proper sortings, entries were made in the 
appropriate cells on correlation charts. 

For Original Data Concerning the Experi- 
mental Group. All data secured from the 
Registrar’s records, together with the pre- 
dicted grade point averages, as calculated by 
the use of the regression equation for each 
subject, and the group classification of each 
subject derived as described below*® were 
entered on master original data sheets, 
samples of which will be found in the Appen- 
dix of this treatise. 


C. Formulas 


All formulas were taken from standard 
textbooks in educational statistics.* 


D. Combination of Criteria in 
Regression Equation 


Regression equations have been used by 
various investigators in an effort to improve 
predictive efficiency over that possible by the 
use of a single factor. They generally agree 
that more reliable predictive indices can be 
derived from a combination of two or more 
factors than from one of them alone. 

Douglass* found, after calculating eighteen 
different multiple correlation coefficients, that 
the highest coefficients from two variables are 
obtained from the use of high school marks 
and percentile rank on the American Council 
on Education Psychological Examination for 
College Freshmen. He reported a multiple 
correlation of .626, and also that a third 
variable adds little to the predictive merit of 
a combination of two variables. In his sur- 
vey of ten reliable studies which reported the 
use of various combinations of two or more 
prognostic variables, Douglass® found general 
agreement with his own conclusions. They 
showed a median mutiple correlation of .61, 
while the median correlation for high schoo! 
marks and intelligence was .58. 


2See Chapter III, Infra. 

*Henry E. Garrett, gy in Psychology and Educe 
tion (New York: Longma reen and Company, 1926 
and Karl J. Holinger’ 5 Statistical Methods for Students i 


Education (Chicago: Ginn and Company, 1928). 

*Harl R. Douglass, The Relation of High School Prepare 
tion and Certain Other Factors to Academic Success at the 
University of Oregon (University of nm Publication, I!! 


No. 1; Eugene, Oregon: University of Oregon, September 
1931), p. 49. 
5 Ibid., p. 50. 














March, 1939] 


Wagner® agrees with Douglass that a com- 
bination of the better measures is usually 
found to be more predictive than the best 
single one. She confirms her statement by 
showing a median multiple correlation coeffi- 
cient of .67 for twenty-four authors which 
she lists. The median multiple correlation 
coefficient obtained for studies using high 
school marks and intelligence test scores’ 
was .57. 

Symonds, in summarizing four independent 
studies by May, Wood, Johnston, and 
Symonds, states: 


In each of these four studies both the 
intelligence test and high school marks 
supplement each other so that taken in 
combination they predict college success 
better than either one singly.® 


The author concludes: 


That no one factor predicts college suc- 
cess adequately and that any criterion of 
college success should be the composite of 
several objectively measurable factors.® 


Odell,*® after indicating an expected range 
in correlation from o.40 to 0.50 between 
intelligence test score and college scholarship 
states that a combination of test score and 
high school marks may be expected to yield 
correlations of about o.60 or higher with 
college grades. 

Brammel,"* in his survey of prediction 
studies also points out that many investi- 
gators have obtained better results by 
employing a combination of criteria than by 
confining prediction to one only. 

A. B. Crawford’? reported correlations from 
0.68 to 0.74 on a combination of transmuted 
high school scholarship College Entrance 
Examination Board averages, scholastic apti- 
tude test scores, and age at entrance. 


*Mazie Earle Wagner, “A Survey of the Literature on 
College Performance Prediction,’’ Studies in Articulation of 


High School a College (University of Buffalo Studies, IX; 
Buffalo, N. Y.: University of Buffalo, 1934), p. 198. 
' Ibid., p. 199. 


* Percival M. Symonds, Measurement in Secondary Educa- 
tion (New York: The Macmillan Company, 1928), p. 423. 

* Ibid., p. 425. 

* Charles W. Odell, Predicting the Scholastic Success of 
College Freshmen (Bureau of Educational Research, Bulletin 
ma ts Urbana: University of Illinois, September 13, 1927), 
Pp. ‘ 


uP, Roy Brammel, ‘Articulation of High School and 
College,” The Reorganization of Secondary Education (Office 
of Education Bulletin, No. 17, 1932, —s Survey of 
Education Monograph 'No. 10. Washington, : Office of 
Education, 1933), p. 25. 

2A. B. Crawford, “Forecasting Freshmen Achievement,” 
School and Society, XXXI (January 25, 1930), 125-132. 


UNPREDICTED SCHOLASTIC 


ACHIEVEMENT 163 
In view of the findings of Douglass and 
other investigators and because of the 
availability of data, the combined criteria of 
high school marks and percentile scores on 
the American Council on Education Psycho- 
logical Examination for College Freshmen 
were used in the development of the regres- 
sion equation for predicting college achieve- 
ment for the purpose of this investigation. 


E. Method of Classifying Subjects 
Into Groups 


This investigation was concerned with dis- 
covering the characteristics of the subjects, 
classified as students of good promise or stu- 
dents of poor promise, whose work was 
graded either better or poorer by the instruc- 
tors than was predicted in each case by the 
personnel workers.’* It therefore became 
necessary to define precisely the meaning of 
the terms “good promise,” “poor promise,” 
“better than predicted,” “and poorer than 
predicted.” These terms were defined in a 
way that produced a number of cases in 
extreme groups that could be studied statis- 
tically. 

Wagner,’* in a study of inconsistent or 
unpredicted performance in college, divided 
and classified her population into five groups. 
She did this by plotting a graph of high school 
and college grades upon which she drew a 
trend line to represent the average college 
grade made by students of a particular high 
school average. All students whose college 
marks were either reliably above or below the 
average mark made by those representing a 
certain average on the New York Regents’ 
Examination (the measure of high school 
achievement) were selected for her study. For 
the purpose of reliability she chose only 
those cases whose actual college grade varied 
from the predicted score more than the prob- 
able error of the estimate. Cases were 
classified into the five groups in terms of 
probable error deviations, as follows: 


1. Those who obtained college marks very 
much better than would have been pre- 
dicted—at least two probable errors; 

2. Those who obtained college marks 
somewhat better than would have been 
predicted—one to two probable errors; 


ri equation, as derived from data concerning 
high marks and percentile scores and ted in 
Chapter III, provided the means for obtaining the predicted 
college grade point average of each subject. 

“ Wagner, ‘Studies in Academic Motivation,” a in 
Articulation of High School and College, pp. 188-19 





- 


a Fe >. 








—s 





164 JOURNAL OF EXPERIMENTAL EDUCATION 


3. Those who were non-deviates, or who 
obtained college marks within 1 P. E. 
of those that would have been predicted 
for them; 

4. Those who obtained college marks some- 
what poorer than would have been pre- 
dicted—one to two probable errors; 

5. Those who obtained college marks very 
much poorer than would have been pre- 
dicted—at least two probable errors. 


She called groups 1 and 2 the positive 
deviates and groups 4 and 5 the negative 
deviates. 

The method of classifying subjects into 
groups in this investigation differed from 
Wagner’s in several respects, namely, (a) in 
the application of combined criteria for pre- 
diction, instead of a single criterion, (b) in 
setting the limits of groups in terms of 
standard deviation instead of probable error, 
and (c) in grouping subjects not only on the 
basis of obtained college marks that were 
better or poorer than predicted, but also on 
the basis of the quality of the predicted col- 
lege marks. The difference mentioned first 
was due to an effort to improve prediction; 
the second has no real significance, being 
merely a matter of choice between statistical 
formulae; and the third difference was 
justified on the basis that the segregation of 
the deviates into subgroups might serve to 
disclose more facts than could come to light 
when all students of all degrees of promise 
were grouped together. Consequently, a 
classification procedure was devised which 
would arrange subjects in groups large enough 
for statistical calculation, both in terms of a 
scholarship quality scale and of deviation from 
prediction. 

The procedure involved the following 
steps: 


1. A graph was plotted of the predicted 
college grade point averages and the differ- 
ences between the predicted college grade 
point averages and the obtained college grade 
point averages. 

2. This graph was divided into nine sec- 
tions, each containing approximately the 
same number of cases. The division was made 
by setting the limits of the middle group on 
the X axis at plus one-half sigma and minus 
one-half sigma and the middle group on the 
Y axis likewise at plus one-half sigma and 
minus one-half sigma. The upper and lower 


[Vol. 7, No. 3 


groups were limited by plus one-half sigma 
at the top of the range and by minus one- 
half sigma at the bottom of the range 
respectively. 

3. Each of the nine groups of subjects was 
assigned a code number and described as 
follows :** 


1A—Good positive deviates. Those who 
were predicted to do good college 
work but did better. 

1B—Good non-deviates. Those who were 
predicted to do good work and 
obtained the record predicted. 

1C—Good negative deviates. Those who 
were predicted to do good work but 
did poorer. 

2A—Average positive deviates. Those who 
were predicted to do average work 
but did better. 

2B—Average non-deviates. Those who 
were predicted to do average work 
and obtained records predicted. 

2C—Average negative deviates. Those who 
were predicted to do average work 
and did poorer. 

3A—Poor positive deviates. Those who 
were predicted to do poor work but 
did better. 

3B—Poor non-deviates. Those who were 
predicted to do poor work and 
obtained record predicted. 

3C—Poor negative deviates. Those who 
were predicted to do poor work and 
did poorer. 


It will be noted that the term “average 
work” was defined arbitrarily as that repre- 
sented by a grade point average that fell 
between plus one-half sigma and minus one- 
half sigma of the distribution of predicted 
college grade point averages, and that “good 
work” and “poor work” represented grade 
point ratios that fell above plus one-half 
sigma, and below minus one-half sigma, 
respectively. It will be noted, also, that the 
term “obtaining the record predicted” was 
defined arbitrarily as the achievement of a 
grade point average whose difference from the 
predicted average fell within plus one-half 
sigma and minus one-half sigma of a distri- 
bution of the difference between these meas- 
ures for the total population. Students classi- 
fied as “better than predicted” or “poorer 
than predicted” were therefore described as 


% The code designations and limits of the groups are 
shown diagrammatically in Chart 1, page 20. 




















March, 1939] 
CHART / 


LUdITS OF THE SUB- GROUPS OF THE 
ToTaL EXPERIMENTAL POPULATION 


Predicted 











Dredicted poor work Average wort Predicted good werk 
x3 my 
4 ’ A 
$s ' Tota/ 
i 3A 2\A /A —. 
. Door positive dewates posite Good positive denotes 
$ aevigtes 1 
: <2 coses 34 dases IS cases 
N +50 ! 
S 733 , 
z ° 
. 38 2\8 ‘8 _ 
be Door. nen. deviates_ | Average | Good nen-deviotes \aenasp, 
Hy 7en- Tes 
+3 " /S4 
> 47 coses 66 cpses 4/ cases (O3e5 
s ~ 56 ! 
Ss =f | 
¢ c 
N } Tote! 
& 3c 2C 1c negate 
ny Door negetive aenates A a Good negotwe deriates janet 
§ Ps “r 
g 36 cases 50 lases JI/ cases cases 
Sk aie [ero Selene 

















36 Mii? 195 
Preaicted Grade Pont Averages 


having achieved college averages whose 
differences from those predicted fell above 
one-half sigma or below one-half sigma, res- 
pectively, of the distribution of the differences 
between the predicted and the obtained col- 
lege grade point averages for the whole 
population. 

It will be further noted that in the group 
code numbers the numerals 1, 2, and 3, 
signify the quality of the predicted college 
marks as good, average, and poor, respec- 
tively, while the letters A, B, and C indicate 
variation from prediction, as, better than pre- 
dicted (positive deviates), as predicted (non- 
deviates), and poorer than predicted (nega- 
tive deviates), respectively. 

It will be readily seen that each code num- 
ber carries a key to the description of a cer- 
tain group. For example, 1A represents the 
group of students predicted to do good work 
whose performance exceeded that predicted. 
The code number 1C signifies a group pre- 
dicted to do good work but who fell short 
of the estimate. 

The setting of the lower limits of the A 
group at plus one-half sigma, and the upper 
limit of the C group at minus one-half sigma 
of the distribution of the differences between 
the obtained and predicted averages, was 
justified by the fact that this investigation 
was concerned primarily with the extreme 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 


105 


groups between which there was relatively 
little probability of overlapping. Since the 
two groups were separated by one sigma of 
this distribution, and since one sigma was 
approximately equal to 1.41 times the prob- 
able error of the estimate’® the chances were 
only about one in three that any true score 
in either group would fall in the other. The 
setting of the lower and upper limits of groups 
1 and 3, respectively, was justified in like 
manner with the chances of one to one*’ that 
any true score in either group would fall in 
the other.’® 


F. Method Used to Compare Sub-Groups on 
the Measures Applied 


Throughout this treatise the groups are 
presented in the paired combinations as 
follows: 


Group 
1A—predicted good posi- 
tive deviates with 
1A—predicted good posi- 
tive deviates with 
1A—predicted good posi- 
tive deviates with 
1C—predicted good neg- 
ative deviates with 
1C—predicted good neg- 
ative deviates with 
3A—predicted poor posi- 
tive deviates with 
A—predicted total posi- 


Group 
1C—predicted good neg- 
ative deviates. 
3A—predicted poor posi- 
tive deviates. 
3C—predicted poor neg- 
ative deviates. 
34—predicted poor posi- 
tive deviates. 
3C—predicted poor neg- 
ative deviates. 
3C—predicted poor neg- 
ative deviates. 
C—predicted total neg- 
ative deviates. 


tive deviates with 


It will be observed that these paired group- 
ings represent every possible combination of 
the extremely deviating sub-groups of the 
total experimental population and that the 
middle groups have been omitted, except when 
presented in combination with the total 
grouping of positive or negative deviates. 

The reliability of the differences between 
the various sub-groups on the various meas- 
ures was determined by use of the usual 
formula applied when computing the standard 
error of the differences between their means. 
In order to expedite the making of compari- 
sons between groups the reliability of the dif- 
ferences betwen the means for each measure 
was expressed in terms of the ratio of the 
difference to its standard error. Thus the 


% The sigma of the distribution was .45. The probable 
error of the estimate was .321. 

7 The sigma of the distribution was .45. The probable 
error of the estimate was .321. 

%* As will be noted in Chapter III, this procedure resulted 
in the classification of subjects into groups which showed 
completely reliable statistical differences on the measures used 
in their organization. 











166 


probability that the differences were greater 
than zero could be directly ascertained by 
inspection of the ratio. According to general 
practice a ratio of three is accepted as indica- 
tive of complete reliability.‘** Therefore, any 
value of less than three obviously falls short 
of complete statistical reliability, and any 
value greater than three indicates added 
reliability. Furthermore, once a critical ratio 
is obtained whose magnitude is less than three, 
the chances in one hundred that the difference 
between the means of the measures is greater 
than zero can be readily read from a 
statistical table.*° 


CHAPTER III 
STATISTICAL PREPARATION OF DATA 


In this chapter will be presented descrip- 
tions of the populations of the various groups 
of subjects involved in this investigation, 
namely, the prediction group, the total experi- 
mental group, and the sub-groups of the ex- 
perimental group. This chapter will give 
consideration to the statistical preparation of 
the data preliminary to the analysis of the 
characteristic of sub-groups in terms of the 
measures used in this study. 

Description of the Prediction Population. 
The population consisted of six hundred 
freshmen who had completed one year of col- 
lege work selected as described in Chapter II. 
They represented a reasonably valid sampling 
as indicated by the distribution of the three 
measures applied, namely, college grade point 
average, high school grade point average, and 
American Council percentile scores, as shown 
in Table I. 

Coefficients of Correlation. Scatter dia- 
grams were prepared with the ranges divided 
into an appropriate number of intervals in 
each case as follows: high school grade point 
averages, fourteen; college grade point aver- 
ages, eighteen; and American Council per- 


” Garrett, op. cit., p. 133. 
* For the table used in this investigation see Garrett, 
op. cit., p. 134. 


JOURNAL OF EXPERIMENTAL EDUCATION 


{[Vol. 7) No ? 


centile scores, twenty. Following the usual 
procedure the six hundred individual data 
cards were sorted into piles, representing the 
various cells, and counted. The number in 
each cell was entered in the proper cell of the 
scatter diagram. Zero order coefficients of 
correlation were obtained in the usual manner 
by use of the Pearson product—moment 
formula. The probable errors of the coeffi- 
cients were taken from a statistical table.' 
The coefficients of correlation and their prob- 
able errors, thus determined, are given in 
Table II. By referring to this table it may be 
seen that all the coefficients of correlation 
obtained were approximately equal to the 
medians for similar measures as reported in 
other investigations.” 

The multiple correlation coefficient for the 
two prognostic variables, namely high school 
averages and American Council percentile 
scores, was .561. Since the coefficient of cor- 
relation between college averages and the 
single variable of high school averages was 
.524, it would appear that the addition of a 
second variable in the computations adds 
little to the accuracy of prediction. In view 
of the slight increase in predictive accuracy 
of the combined criteria over the single 
criterion, and because of the large amount of 
labor required for computing the multiple 
correlation coefficient, it would seem that for 
general practical purposes, as distinguished 
from research, the use of the single prediction 
criterion of high school averages should 
suffice. 

Values Derived for the Regression Equa- 
tion. The formulae and procedure described 
by Holzinger* were employed with the data 
contained in Table I and Table IT, to compute 
the values for the regression equation 


X= yo.sX_-+Dy3-2%y-+C 
where X, signifies the predicted college grade 


point average; X, that high school grade point 


1 Garrett, Statistics in Psychology and Education, p. 171. 
2See review of the literature in Chapter II, Section D 


supra. 
P Holzinger, Statistical Methods for Students in Education 
p. 293. 


TABLE I 
MEANS, STANDARD DEVIATIONS, AND RANGES OF MEASURES OF THE PREDICTION POPULATION 
(600 CASES) 

Standard 

Measure Range Mean deviation 
1. College grade point average...._._____.___-- _... —0. 33 to 3.00 1.204 0. 574 
2. High school grade point average...._____._____- 1. 30 to 3. 96 2.745 0. 549 
8. American Council percentile scores__.___---_-.-- 00to 99 54. 92 25. 880 














March, 1939] 


TABLE II 


Zero ORDER COEFFICIENTS OF CORRELATION 
BETWEEN MEASURES OF THE PREDICTION 
POPULATION 

3. American 1. College 
council grade point 
percentile average 
scores 

1. College grade 
pointaverage.. .409 «=. 023 
2. High school 
grade point 
average._____- .437 =. 022 . 524 =. 020 


average, X, the American Council percentile 
score, and C a constant. With the computa- 
tions thus made the regression equation 
reads 


X,==.446X,+.005X,-+-.294. 


The standard error of the estimate rendered 
was .476 and the probable error of the 
estimate was .321. 

Description of the Total Experimental 
Population. Following the procedure des- 
cribed in Chapter II, Section III, 382 sub- 
jects were selected—190 men and 192 women. 
While it was recognized that sex differences 
operate to produce differencs in the scholastic 
achievement between men and women, no 
segregation of the sexes was made in this 
study. Wagner reported that boys who devi- 
ated negatively from predicted college grades 
exceeded the number of positive deviates, 
whereas the opposite was true for the girls.* 
On the basis of her finding, the inclusion of 
the men and women together in the various 
groupings of deviates constitutes a delimita- 
tion of this study. However, this was neces- 
sary because the number of subjects for each 
group would have been too small for statistical 
analysis. 

The calculation of the means and standard 
deviations for the 382 cases for the distribu- 
tion of college averages, the high school aver- 
ages, and the American Council percentile 
scores gave the results as presented in 
Table III. 


* Wagner, “Studies in Academic Motivation,” Studies in 
Articulation of High School and College, p. 192. 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 167 


By applying the regression equation the 
value of the predicted grade point average 
for each of the 382 cases was computed and 
entered upon the master data sheet.’ Follow- 
ing this, the difference between each subject’s 
predicted average and obtained average was 
computed and entered on the master data 
sheet. 

Description of the Sub-Groups of the 
Experimental Population. Following the pro- 
cedure as described in Chapter II, Section E 
above, the total experimental population was 
divided into nine groups. The means, stan- 
dard deviations, and the ranges of the sub- 
groups on American Council percentiles, high 
school averages, obtained college averages, 
predicted college averages, and the differences 
between obtained and predicted college aver- 
ags are presented in Tables IV, V, VI, VII, 
and VIII, respectively. The standard error of 
the differences between the means of paired 
groups on predicted college averages are 
shown in Table IX, and Chart 2. Table X and 
Chart 3 present the standard error of the dif- 
ferences between the means of the paired 
groups on the differences between the pre- 
dicted college averages and the obtained col- 
lege averages. Since the major concern of 
this study is with the deviates, consideration 
will be given only to groups 1A, 1C, 3A, 3C, 
A and C. By the elimination of the non- 
deviating groups any differences in the char- 
acteristics of the extreme groups should be 
brought out more clearly. This procedure was 
tested by calculating the statistical differences 
between the means of the distributions of pre- 
dicted college averages of the groups in every 
paired combination possible. By referring to 
Table [IX and Chart 2, it will be noted that 
the standard error of the difference between 
the means of groups 1A and 3A, between 1A 
and 3C, between 1C and 3C, and between 
1C and 3A in each case less than 
one-fifth of the respective difference be- 
tween the means. Since the 1A and the 1C 
groups are, by definition, those groups com- 

5A sample page of the master data sheet is shown in the 
Appendix. 


TABLE III 
MEANS, STANDARD DEVIATIONS, AND RANGES OF THE EXPERIMENTAL GROUP (382 CASES) 


Measure 


1. College e point average..._._______- 
2. High school grade point average... __ __ - 


3. American Council percentile score _......_____-_- 


Standard 

Range Mean deviation 
_.. —0. 37 to 2. 83 1.26 0. 54 
1. 36 to 4.00 2.74 0. 55 
Olto 98 46. 65 26. 89 





eee F 


rs 








168 JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE IV 





[Vol. 7, No. ; 


MEANS, STANDARD DEVIATIONS, AND RANGES OF THE SUB-GROUPS AND GROUPINGS OF THE 
EXPERIMENTAL POPULATION ON THE AMERICAN COUNCIL ON EDUCATION 


PSYCHOLOGICAL EXAMINATION (PERCENTILE SCORES) 


Group or Grouping Number Range Mean 
1A. Mages Bs a See whsealdicteiati 35 34—97 75. 64 
| ee ‘i : 41 14—98 66. 65 
= 31 23—96 70. 89 
ee 34 02—92 40.15 
+See 66 06—90 45.91 
is eeu 50 11—92 50. 00 
ee 42 02—53 21.19 
le 47 01—84 26. 65 
See 36 01—81 33. 47 
A—All positive deviates - : 111 02—97 44.17 
B—All non-deviates___._____-. 154 01—98 45. 55 
C— All negative deviates. ____- : 117 01—96 50. 45 
Total population.________- my tok . 382 01—98 46. 65 

TABLE V 


Standard 
deviation 


MEANS, STANDARD DEVIATIONS, AND RANGES OF THE SUB-GROUPS AND GROUPINGS OF THE 
EXPERIMENTAL POPULATION ON HIGH SCHOOL GRADE POINT AVERAGES 


Group or Grouping Number Range Mean 
1A. sib eran teeth wataiers wil ‘ 35 2. 71—4. 00 3.49 
1B. ; 41 2. 81—3. 86 3.36 
1C.. 31 2.74—3. 81 3. 28 
Ge eS 34 2.21—3. 29 2. 82 
ee : 66 2. 29—3. 21 2.74 
2C ‘ 50 2. 11—3. 33 2.71 
3A. 5 ; 42 1. 43—2. 83 2.16 
3B. 47 1. 36—2. 70 2.16 
3C. 36 1.41—2. 75 2. 23 
AG dnietingla sie s 111 1. 43—4. 00  &. 
cree eB ee ai 154 1. 36—3. 86 2.73 
Ee ane ’ 117 1.41—3. 81 2. %& 
Total population. ..______- sda ae 1. 36—4. 00 2.74 

TABLE VI 


Standard 
deviation 


0. 24 


towwn 
mm DOD oO 


escssesss 
32 


oem OF 
orn) G9 


MEANS, STANDARD DEVIATIONS, AND RANGES OF THE SUB-GROUPS AND GROUPINGS OF THE 


EXPERIMENTAL POPULATION ON OBTAINED COLLEGE AVERAGES 


Group or Grouping Number Range Mean 

Pn cake ciau Se eee td al et hee attend Chil 35 1. 74—2. 83 2.25 
See ae : 41 1. 24—2. 18 1. 65 
IC. 31 0. 39—1. 59 1.07 
2A. 34 1. 43—2. 48 1.75 
2B 66 0. 96—1. 58 1, 28 
SE ee eee See ee 50 0. 15—1. 15 0.77 
3A. 42 0. 91—2. 11 1. 43 
a 47 0. 51—1. 30 0.93 
DICINEAl ndinc nitedianiinienn ee wenéelamaiind 36 —0. 37—0. 85 0. 48 
PS Pe Ae SF ty 111 0. 91—2. 83 1.79 
RE EE. TAS ENS 154 0. 51—2. 18 1.27 
117 —0. 37—1. 59 0. 76 


Standard 
deviation 
0. 27 
0. 22 
0.25 
0.26 











March, 1939] . 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 


109 


TABLE VII 


MEANS, STANDARD DEVIATIONS, AND RANGES OF THE SUB-GROUPS AND GROUPINGS OF THE 
EXPERIMENTAL POPULATION ON PREDICTED COLLEGE AVERAGES 


Standard 

Group or Grouping Number Range Mean deviation 
10 le epee haetnk baceieetaeehaee 35 1.34—1. 95 1. 64 0.18 
| ae See eee ‘ 41 1. 33—1. 86 1. 54 0.14 
i. 31 1. 33—1. 80 1. 53 0.14 
2A... . 34 1. 02—1. 31 3. av 0. 07 
2B... 66 1.01—1. 31 1.16 0.10 
2C. . red a eS a ees . 50 1.01—1. 31 Lae 0. 09 
SR oe ae ee ree 42 0.41—1. 00 0.7 0.17 
$B. . - 4 47 0. 36—1. 00 0. 81 0.15 
3C- 4 36 0. 63—1. 00 0. 87 0.11 
| See 111 0.41—1. 95 Dy 0. 39 
De. shattg tuditiowncc vues stad . 164 0. 36—1. 86 1.16 0.30 
RE ere ee ‘ x 117 0. 63—1. 80 1.17 0.27 
Total population... _- ee Sa - 882 0. 36—1. 95 1.17 0. 32 

TABLE VIII 


MEANS, STANDARD DEVIATIONS, AND RANGES OF THE SUB-GROUPS AND GROUPINGS OF THE 
EXPERIMENTAL POPULATION ON THE DIFFERENCES BETWEEN THE OBTAINED AND THE 
PREDICTED COLLEGE AVERAGES 


Standard 

Group or Grouping Number Range Mean deviation 
De inte haar tad aa aekeio ae aes 35 0.34— 1.438 0. 62 0.24 
EE SEP ee eee ee 41 —0.11— 0.33 0.12 0.14 
RS Re eee ae ce a 31 —0.96——0.138 —0.45 0. 20 
ROD ea a ee ee me eae 34 0.34— 1.21 0. 58 0. 23 
SRR a OR AE ie a ao Se ae a8 66 —0.10— 0.33 0.13 0.13 
SR RAE a ie Ne 3 50 —1.06——0.13 —0.39 0.25 
RE ES. ee em ey ort ee 42 0.37— 1.33 0. 66 0.25 
RE SS ee a ene nee eer tenes 7 —0.09— 0.33 0.13 0.14 
ES TES SE SE et Se 36 —1,.28—-—0.13 —0.38 0.24 
ee Nn eas ahtns Fata eae 111 0.34— 1.43 0. 62 0.24 
a ee ee ee een 154 —0.11— 0.33 0.13 0.13 
a le li ean dtm 117 —1.28——-0.13 —0.45 0.24 
nn ctndn ca eaennneene . —1.28— 1.43 0.11 0.44 

TABLE IX 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON PREDICTED COLLEGE AVERAGES 


Groups Compared — — 
0 oO 

Group 1 Group 2 Group 1 Group 2 
1A 1C 1. 64 1. 53 
1A 3A 1.64 0.78 
1A 3C 1.64 0. 87 
iC 3A 1. 53 0.78 
1C 3C 1. 53 0. 87 
3A 3C 0.78 0. 87 
A Cc 1.17 1.17 





Difference S&S. E. D* Chances 

between diff. S. E. in 
means diff. 100 
0.11 0.04 2.7 99.7 
0. 86 0. 04 21.41 100 
0.77 0.04 21.68 100 
0.75 0.04 20. 64 100 
0. 66 0. 03 21.21 100 
0.09 0. 03 —2. 81 99.7 
0. 00 0. 04 0. 00 0 


* The first oom pene the trait to a greater degree than the second group except where indicated 
by a minus sign p ing the ratio of the difference to its standard error, when the reverse is true. 




















JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 7, No. 3 


TABLE X 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE DIFFERENCES BETWEEN 
PREDICTED COLLEGE AVERAGES AND OBTAINED COLLEGE AVERAGES 


Groups Compared — —_ 
0 0 

Group 1 Group 2 Group 1 Group 2 
1A 1C 0. 62 —0. 45 
1A 3A 0. 62 0. 66 
1A 3C 0. 62 —0. 38 
1C 3A —0. 45 0. 66 
1C 3C —0.45 —0.38 
3A 3C 0. 66 —0. 38 
A Cc 0. 62 —0.45 





Difference S. E. »* Chances 

between diff. S.E. in 
means diff. 100 
1.07 0.05 19. 75 100 
0.04 0. 06 —0. 71 76 
1.00 0.06 17. 55 100 

pe i 0.05 —21.06 100 

0. 07 0.05 — 1.30 90 
1.04 0. 06 18. 72 100 
1.07 0. 03 33. 65 100 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART 2 


Reiiasinity of DIFFERENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 
DetoicTlO COLLEGE AVERAGES 














TOTAL 
, = 
3A < I/A A 
‘. [1 
ii ii 
t ; 
1 
1 | 280 11279 0 
ly + 
* tr 
it 
7 V 
rt TOTAL 
oC <— 2 fC 
LEGENO 
~ s 
Oar 

LESS THAN | =— — 

10 7O 20 ----- 

i ese (J 


34 AND OVER 


GREATER THAN Bb 


prising the students whose college work was 
predicted to be good, and since the data as 
presented in the table and graph indicate the 
means of both 1A and 1C to be greater than 
either 3A or 3C, it is obvious that the pro- 
cedure followed produced groups, which with 
statistical certainty possessed differences 
greater than zero, in the measure used, namely 
predicted college grade point average. Like- 
wise, reference to Table X, and Chart 3, will 


CHART 3 


RELIABILITY OF DIFFERENCES SETWEEN 
THE MEANS OF AIRED GROUPS FOR 
PREDICTED AND OBTAINED COLLEGE AVERAGES 


TOTAL 
3A —7|\—> JA A 
ii 

















LEGEND 
o 
Jan 
LESS THAN/ =< — 
40 70 2.0 
2/ 70 3.0 


3./ ANO OVER 
GREATER THAN 





cael 


also show the statistical certainty of a differ- 
ence greater than zero between the means of 
groups 1A and 1C, 1A and 3C, 3A and 3C, 
and 3A and 1C on distributions of their re- 
spective differences between predicted and 
obtained college grade point averages. In 
this case, the means of either of the 1A and 
3A groups are greater than either of the 
means of the 3C and 1C groups. This indi- 
cates that the procedure of selecting extreme 








March, 1939] 


groups produced groups significantly different 
in the amount and direction of the differences 
between obtained college averages and pre- 
dicted college averages. 


CHAPTER IV 


STATISTICAL ANALYSIS AND 
FINDINGS 


The characteristics of the various sub- 
groups of the experimental population, as 
indicated by the measures applied and by the 
computation of the reliability of the differ- 
ences of paired groups, will be presented in 
this chapter under two major headings, 
namely, academic measures and non-academic 
measures. 


I. ACADEMIC MEASURES 


Mechanics of English Usage—Barrett— 
Ryan English Test. 1. The data as presented 
in Table XI and Chart 4 show no completely 
reliable differences between the total positive 
deviates (A) and the total negative deviates 
(C) in their use of the mechanics of English, 
although there are about sixty-nine chances 
in a hundred that the positive deviates excel 
the negative deviates on this measure. 

Wagner’ in her study of variations from 
predicted college achievement reported a 
superiority of the total positive deviates over 
the total negative deviates somewhat greater 
than that found in this investigation. How- 
ever, both findings are alike in that no 
statistically reliable differences are indicated. 

2. The good positive deviates (1A) demon- 
strate a reliable superiority over all groups 
except the good negative deviates (1C). 
However, an approximately reliable difference 


1 Mazie Earle Wagner, “Studies in Academic Motivation,” 
Studies in Articulation of High School and College (Univer- 
sity of Buffalo Studies, XIII; Buffalo, N. Y.: University of 
Buffalo, 1936), p. 198. 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 


CHART 4 


Reiiaei.iTyY OF DIFFERENCES BETWEEN 
THE MEANS OF PAIREO GROUPS FOR 
THE BARRETT- RYAN ENGLISH TEST 














LEGEND 

oa 

Caw 
LESS THAN / = — 
40 702.0 
2/ 703.0 
Z/ AND OVER 
GREATER THAN < 


is shown (98 chances in 100) in favor of the 
good positive deviates over the good negative 
deviates. 

3. The poor negative deviates (3C) tend 
to excel the poor positive deviates (3A), but 
without statistical reliability (93 chances in 
100). 

4. The students who were predicted to do 
work below the average and who exceeded 
expectations (3A) show the poorest rating of 
all groups in their knowledge of the mechanics 
of English. The mean for this group was 81.45 


TABLE XI 
DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE BARRETT—-RYAN ENGLISH TEST 


Mean Mean 
of of 
Group 1 Group 2 


112.75 105. 07 
112.75 81.45 
112.75 87. 30 
105. 07 81.45 
87.30 
87.30 


Groups Compared 


Group 1 Group 2 


Difference 


D* 
8. E. in 
diff. 100 

2.08 98 

8. 29 100 

6. 50 100 

6. 67 100 

4. 82 100 
—1. 55 93 

0. 51 69 


S. E. Chances 
between i 


means 


7. 68 
31. 30 
25. 45 
23. 62 
17.77 

5. 85 

1.35 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 














172 


as compared with 87.30, 105.07, and 112.75, 
respectively, for the deviating groups classi- 
fied as poor negatives (3C), poor positives 
(1C), and good positives (1A). 

These findings indicate a superiority in 
English usage, as measured by the Barrett— 
Ryan test, of students of good promise (1A 
and 1C) over students of poor promise (3A 
and 3C) at the time of college entrance. They 
also show that students of good promise who 
improve the quality of their academic per- 
formance (1A) demonstrate better mastery 
of English usage than similar students whose 
achievements fall below the level predicted 
(1C). Thus, for students of good promise, 
English usage seems to be a quality associated 
with, or existing concurrently with, the 
improvement of academic record. However, 
for the negative deviates (3A and 3C) this 
situation does not hold true. There is thus 
indicated a need for further investigation of 
the causes, other than the differences in the 
mastery of the mechanics of English, that 
tend to produce better scholastic records than 


those predicted, especially in the case of stu-. 


dents of poor promise. 

Reading Comprehension—The Shank 
Tests of Reading Comprehension. Reading is 
generally considered the most important tool 
in the learning process, especially on the col- 
lege level, because students must depend very 
largely upon their ability to acquire facts and 
ideas from the printed page in order to achieve 
the objectives set for them. Shuttleworth’ 
reported a zero order coefficient of correlation 
of only 0.462 between reading comprehension 
and freshman scholarship grade point average. 
The present investigation demonstrates a 


? Frank K. Shuttleworth, “Environmental and Character 
Factors Involved in Scholastic Success,’ Jourmal of Educa- 
tional Psychology, XX (September, 1929), 427. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 7, No. 3 


similar relationship, for, as will be noted by 
referring to Table XII, and Chart 5, both the 
positively and the negatively deviating groups 
of students classified as above average (1A 
and 1C) showed means of 78.69 and 75.20, 
respectively, results which were reliably 
higher than the respective means of the posi- 
tively and negatively deviating groups of 
below average students (3A and 3C), which 
were 57.28 and 65.20, respectively. The in- 
ference is that students of good promise 


CHART 5S 


Ret/ABiLiTY OF DIFFERENCES BETWEEN 
THE MEANS OF FAU/REO GROUPS FOR 
SWANK TESTS OF READING COMPREHENSION 














TOTAL 
3A < a; a 
' A 
(> | 
i! ; 33 
it ' 
I laes tos ' 
it | 
1 | 
it | 
uJ Y 
TOTAL 
SG < 3 
LEGEND 
Oat 
LESS THAN / ~-—— 
40 702.0 aie ae oe 
Ye SL ” el semenegie 


3.1 ANDOVER (____) 
GREATER THAN < 


TABLE XII 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SHANK TESTS 
OF READING COMPREHENSION 


Groups Compared - a 
0 o 

Group 1 Group 2 Group 1 Group 2 
1A 1C 78. 69 75. 20 
1A 3A 78. 69 57. 28 
1A 3C 78. 69 65. 20 
1C 3A 75.20 57.28 
1c 3C 75.20 65. 20 
3A 3C 57.28 65. 20 

A Cc 66. 70 68. 62 


Difference S. E. D* Chances 
between diff. S. E. in 
means diff. 100 
3. 49 3. 32 1.05 85 
21.41 3.00 7.13 100 
13. 49 3.14 4.29 100 
17.92 3.17 5. 65 100 
10. 00 3. 30 3.03 100 

7.92 2.99 —2.65 99.6 
1.92 2.06 —0. 93 82 


* The first ee the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 





March, 1939] 


possess markedly superior ability to read 
comprehendingly. 

Further inferences that may be drawn from 
the data of this study are presented below. 


1. The possession of differing degrees of 
reading skill appears to have but little, if 
any, relationship to variation from predicted 
scholastic records. The chances are only 
about eighty-three in one hundred that there 
is a difference greater than zero between the 
total positive (A) and total negative (C) 
groups, in favor of the latter. There was a 
relatively small difference of 1.93 between 
their means. 


2. The chances are but eighty-five in one 
hundred that good positive deviates (1A) 
excel the good negative deviates (1C) on 
this measure. 


3. Poor negative deviates (3C) are almost 
reliably superior (99.6 chances in 100) to 
poor positive deviates (3A). 


4. Relatively high reading comprehension 
appears to be a significant characteristic of 
students of good promise. However, no con- 
vincing evidence is available to prove that it 
constitutes a factor which distinguishes the 
positive devates (A) from the negative devi- 
ates (C). Had this been the case, then the 
poor negative deviates (3C), with better 
reading ability than the poor positive devi- 
ates (3A) to begin with, should have obtained 
better grades in colleve. It is obvious that 
the opposite was true. Some other qualities 
than reading comprehension must have been 
influencing the poor positive deviates (3A) 
to exceed expectancy, especially in view of 
the fact that their percentile scores on the 
American Council on Education Psychological 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 


173 


Examination are reliably inferior to those of 
the poor negative deviates (3C).° 

Language Mechanics and _ Literature— 
Sones—Harry High School Achievement Test, 
Part I. This test contains sections which 
represent a sampling of several aspects of 
English including grammatical constructions, 
word meanings, abbreviations and prefixes, 
foreign phrases, reading comprehension, and 
literary forms, authorship, characters, pass- 
ages, and themes. It therefore tests for liter- 
ary knowledge, reading skills, and language 
usage. 

The observations concerning 
secured from this test follow: 

1. It is to be expected that the findings 
with reference to group differences would 
approximate those recorded in connection 
with the Barrett—Ryan English Test and the 
Shank Reading Tests. An examination of 
Table XIII and Chart 6 in connection with 
Charts 4 and 5 will show that this expectation 
is correct. 

2. There are no completely significant dif- 
ferences between the positive (A) and nega- 
tive deviates (C) as a whole. 

3. Reliable differences are present only be- 
tween students of good promise and students 
of poor promise for both the good positive 
deviates and the good negative deviates (1A 
and 1C) are definitely superior to both the 
poor positive (3A) and the poor negative 
deviates (3C). 

4. Some indication of a possible superiority 
of the good positive deviates (1A) over the 


3 By referring to Table IV it will be noted that the mean 
American Council percentile scores of the poor negative devi- 
ates (3C) is 33.47, and that for the poor positive deviates 
(3A) it is 21.19. The difference between the means is 12.28. 
which represents a ratio of the difference to its standard error 
of 3.20, and therefore, practical reliability. 


the data 


TABLE XIII 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SONES—HARRY HIGH SCHOOL 
ACHIEVEMENT TEST, PART I—LANGUAGE AND LITERATURE 


Mean Mean 
of of 
Group 1 Group 2 


84.41 80.78 
84. 41 52.77 
84.41 55. 36 
80.78 52.77 
80.78 55. 36 
52.77 55. 36 
68. 21 65. 89 


Groups Compared 


Group 1 Group 2 


* The first group 
by a minus sign pre 


S. E. 
diff. 


D* Chances 
S. E. in 
diff. 100 
0.77 77 
8.18 100 
7.40 100 
6. 34 100 
5. 69 100 
—0. 73 76 
0.95 83 


Difference 
between 
means 


3. 63 
31. 64 
29.05 
28.01 
25. 42 

2. 59 

2.32 


Som PCO Com 
Gre O00 ~2 
HIN SAI 


to 
~ 
oo 


ssesses the trait to a greater degree than the second group except where indicated 
ing the ratio of the difference to its standard error, when the reverse is true. 














174 
CHART 6 


ReEwiABILITY OF DIFFERENCES BETWEEN 
THE MEANS OF P4IRED GROUPS FOR 
Sones -Hareyr Acmitvemenr Jésr 

LANGUAGE ANDO LITERATURE 














TOTAL 
3A < 2 th a 
A 
73 
| | 
77 9s 
| | 
Y Y 
S TAL 
ae Sen ase 6 
LEGEND 
Cart 
LESS THAN’ =< — 
-, 2a 
asmesae C.....a) 
3/ AND OVER ([_____] 


GREATER THAN 


good negative deviates (1C) (77 chances in 
100) and of the poor negative deviates (3C) 
over the poor positive deviates (3A) (about 
66 chances in 100) is likewise noted. These 
differences are considerably less significant 
than they were on the Shank and the Barrett— 
Ryan tests. 

5. The total positively deviating population 
(A) is indicated as possibly better than the 
total negative group (C) (83 chances in 100) 
on the Sones—Harry test and on the Barrett— 
Ryan test (69 chances in 100), while the 
reverse is true on the Shank tests (83 chances 
in 100). These tendencies may be interpreted 
as indicating that there is more difference be- 
tween groups in the language phases of these 
tests than in the knowledge aspects. All tests 
demonstrate completely reliable differences 
only between students of good promise and 
students of poor promise, although the good 
positive deviates (1A) are nearly statistically 
better than the good negative deviates (1C) 
on the Barrett—Ryan test (98 chances in 100) 
and the poor negative deviates (3C) are 
approximately better than the poor positive 
deviates (3A) (93 chances in 100). 


JOURNAL OF EXPERIMENTAL EDUCATION 





[Vol. 7, No. ; 


6. It is not reasonable to infer that the 
various aspects of language usage account for 
the variation of students’ performance from 
that predicted for them, especially in view 
of the findings which show consistently, 
although not with complete reliability, that 
the group of good positive deviates (1A) is 
superior to its corresponding group of nega- 
tive deviates (1C) when the exact opposite 
is true for the poor negative deviates (3C) 
on all three tests involving language usage. 
Other factors such as motivation, environ- 
mental influences, personal limitations and 
adaptations, may be contributing to the rec- 
orded variation between predicted perform- 
ance and actual achievement. 

General Mathematics—Sones—Harry High 
School Achievement Test, Part II. Analysis 
of group achievement in mathematics as meas- 
ured by the Sones—Harry Test resulted in the 
following observations: 

1. The superior students (1A and 1C) 
possess superior ability in mathematics. 
Whether it is the mastery of mathematics 
or the possession of those qualities which 
facilitate such mastery that differentiates the 
superior student from the student of lesser 
achievement is not shown in this study. In- 
vestigators report a positive relationship 
between mathematical ability and academic 
adjustment represented by zero order coeffi- 
cients of correlation varying around a median 
of about 0.40. 

Segel* surveyed the investigations concerned 
with mathematics ability and subsequent 
general college scholarship. He reports coefii- 
cients of correlations of 0.12 and 0.42 found 
by Remmers and Stoddard, respectively, when 
the mathematics aptitude section of the Iowa 
Placement Examination was used, and coeffi- 
cients of 0.38 and 0.35 by Brown and by 
Stoddard, respectively, when the mathematics 
training section of the same test was applied. 
He reports also that Dvorak and Salyer 
found a coefficient of 0.58 when both sections 
of the same test were combined. 

Douglass*® found a correlation coefficient of 
0.44 between high school marks in mathe- 
matics and freshman grade point average. He 
reports® that Brammel and that Lauer and 
Evans found zero order coefficients of corre- 


*Segel, Prediction of Success in College, p. 62. 

5 Douglass, The Relation of High School Preparation and 
Certain Other Factors to Academic Success at the University 
of Oregon, p. 25. 

* Loc. cit. 











Sis 
as- 


the 


i) 
cs. 
ics 
ch 
he 


ser 


] March, 1939] 








ip | 


Lic 
fi- 
an 











UNPREDICTED SCHOLASTIC ACHIEVEMENT 


175 


TABLE XIV 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SONES—HARRY HIGH SCHOOL 
ACHIEVEMENT TEST, PART II, MATHEMATICS 


Groups Compared — — 
Oo 0 

Group 1 Group 2 Group 1 Group 2 
1A 1C 44.00 33.31 
1A 3A 44.00 24.05 
1A 3C 44.00 26. 69 
1C 3A 33.31 24.05 
1C 3C 33.31 26. 69 
3A 3C 24. 05 26. 69 
A Cc 32.47 32. 76 


* The first group possesses the trait toa 


CHART 7 


Revagitity OF DIFFERENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 
SONES - HARRY ACHIEVEMENT TesT 








PART I] MATHEMATICS 
TOTAL 
SA J/A A 
A 4 
| i | “ 
| 1! | 
10 ; (292 
! | 
' 14 
V 
Pe , ee TOTAL 
Se Me ce EG ¢€ 
LEGENO 
[2] 
Can 
LESS THAN‘ =< — 
40 TO 2.0 PROS = 
2/7030 C___J 


3.4 AND OVER 
GREATER THAN 


lation of 0.39 and 0.47, respectively, in 
similar investigations. 

2. The data (Table XIV and Chart 7) 
definitely demonstrate a reliable superiority 
of the good positive deviates (1A) over all 
other groups. 

3. The good negative deviates (1C) are 
nearly (98.7 chances in 100) reliably superior 
to the poor negative deviates (3C). They 
are definitely better than the students of poor 





Difference S. E. D * Chances 
between diff.  ? eae in 
means diff 100 

10. 69 3.58 2.99 99.9 
19. 95 3.31 6. 03 100 
17.31 3.19 5.42 100 
9. 26 3.05 3.03 100 

6. 62 2.93 2. 26 98.7 
2.64 2. 59 —1.02 84 
0. 29 1.97 —0. 15 56 


reater degree than the second group except where indicated 
_ by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


promise whose work was better than predicted 
(3A), since they obtained an average of 33.31 
as compared with a mean of 24.05 for the 
inferior group (3C). 

4. For the students of poor promise who 
fell short of expectancy (3C) there is noted 
a tendency (84 chances in roo) to do better 
work in mathematics than similar students 
whose records exceeded predictions (3A). 

5. As a whole, there is no significant dif- 
ference between the knowledge of mathe- 
matics of those students who did better work 
than expected (A) and the knowledge of 
mathematics of those who did poorer work 
than was predicted for them (C). 

Mathematical Reasoning — Progressive 
Mathematics Tests, Advanced, Form A, Test 
3—Mathematical Reasoning. The only dif- 
ference of any importance between the find- 
ings for the mathematical reasoning section 
of the Progressive tests and the mathematics 
section of the Sones-Harry test lies between 
the good positive and good negative groups 
(1A and 1C). This statement may be verified 
by comparing Table XV and Chart 8 with 
Table XIV and Chart 7. For the Sones-Harry 
test there was found a critical ratio of the 
reliability of the difference of the means of 
2.99, which indicates practical statistical cer- 
tainty of a difference greater than zero. On 
the other hand, in the reasoning section of 
the Progressive test the critical ratio was 
2.16, which indicates approximately ninety- 
eight chances in one hundred that there is a 
reliable difference. For practical purposes it 
is justifiable to assume that both tests are 
relatively comparable in all of the differences 
that are evidenced. The superiority of stu- 
dents of good promise (1A and 1C) regard- 











170 


JOURNAL OF EXPERIMENTAL EDUCATION 





[Vol.7 


TABLE XV 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE PROGRESSIVE MATHEMATICS 
Tests, Form A, Test 3—MATHEMATICAL REASONING 


Groups Compared oe = 
0 ty) 

Group 1 Group 2 Group 1 Group 2 
1A 1C 46.31 42.47 
1A 3A 46.31 36. 05 
1A 3C 46.31 37.78 
1C 3A 42.47 36. 05 
1C 3C 42.47 37.78 
3A 3C 36.05 37.78 

A Cc 40. 52 40. 84 


Difference S.E. D* Chances 
between diff. aga in 
means diff. 100 

3. 84 1.77 2.16 98.3 
10. 26 1. 69 6. 07 100 
8. 53 1.71 5. 00 100 
6. 42 1. 84 3.49 100 
4. 69 1. 86 2. 53 99.4 
1.73 1.77 —0. 98 83 
0.32 1.18 —0. 27 60 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART @ 


Rewiaeititry of DIFFE@ENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 

PROGRESSIVE MATHEMATICS TEST, ADVANCED 

Form A, TEsT 3- MATHEMATICAL REASONING 








TOTAL 


LEGEND 


Cate 
L&sS THAN¢ =~ — 
40 702.0 
2/ 70 3.0 
3/ AND OVER 
GREATER THAN 


less of deviation, over poor students (3A and 
3C) is the dominant fact observed, just as 
it was on the Sones-Harry test. 
Mathematical Computation —Progressive 
Mathematics Tests, Advanced Form A, Test 
4—Mathematical Computation. 1. For this 
test differences between groups are found 
similar to those observed in connection with 
the Sones—Harry Test—Part II, and the 
reasoning section of the Progressive tests. 


This statement may be verified by comparing 
Table XVI and Chart 9 with Tables XIV 
and XV and Charts 7 and 8. 

3. The difference between the means (5.69) 
in favor of the good positive deviates (1A) 
over the good negative deviates (1C), is less 
pronounced on the computation section of the 
Progressive test than on the Sones—Harry test 
where the difference between the means is 
10.69. However, the difference is greater for 
computation than it is for reasoning (3.84). 

3. For the poor negative deviates (3C) 
on all three tests some superiority over the 
poor positive deviates (3A) is indicated 
Only on the Sones—Harry test is reliability 
approximated with eighty-four chances in one | 
hundred that the difference is greater than 
zero. 

4. No significant differences between the 
total positive deviates (A) and total negatives 
(C) are found on any one of the three mathe- 
matics tests. 

Wagner’ reported a more marked, although 
not a completely reliable relationship between 
mathematics and deviation from predicted 
college averages than was found in this in- 
vestigation. Her data indicated a superiority 
of positive deviates over negative deviates by 
better than ninety-nine chances in one hun- 
dred for boys. However, the negatively devi- 
ating girls were indicated as superior by sixty- 
nine chances in one hundred. Had Wagner 
combined the boys and girls and computed 
differences accordingly it is probable that 
findings of the two investigations would have 
been in closer apparent agreement. Moreover, 
for neither investigation are the reliabilities of 
the differences complete enough to warrant the 

* Wagner, op. cit., p. 198. 








ated 


"ing 
(IV 


ity | 








— A Nt at 


AE 9m, maenne? ¢ 





March, 1939] 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 


177 


TABLE XVI 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE PROGRESSIVE MATHEMATICS 
Tests, Form A, Test 4—MATHEMATICAL COMPUTATION 


Groups Compared _— — 
oO 0 

Group 1 Group 2 Group 1 Group 2 
1A 1C 62.19 56. 50 
1A 3A 62.19 46.95 
1A 3C 62.19 47.92 
1C 3A 56. 50 46.95 
1C 3C 56. 50 47.92 
3A 3C 46.95 47.92 

A Cc 53. 31 53.31 


* The first group possesses the trait to a 


Difference S.E. D* Chances 
between diff. * =a in 
means diff. 100 
5. 69 3.27 1.74 96 
15. 24 3.04 5. 02 100 
14.27 2. 69 5.31 100 

9. 55 3. 58 2. 67 99.6 

8. 58 3.2 2. 61 99.5 
0. 97 3. 06 —0. 32 62 
0.00 1.93 0.00 0 


ater degree than the second group except where indicated 


by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART 9 


ReiiaBiiTY OF DIFFERENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 

Deoceessive Martnemarics Test, ADVANCED 

form A, Test 4- MATHEMATICAL COMPUTATION 











TOTAL 
cf cmon i, ae 
a 
TOTAL 
S 
LEGEND 
o 
Can 
4@SS THAN/ =< —— 
40 702.0 ---—— 
as mae 6 C... 


3./ ANO OVER 


GREATER THAN 4 


conclusion that the lack of correspondence 
between the findings is due to factors other 
than sampling. 

Natural Science—Sones—Harry High 
School Achievement Test, Form A, Section 
11. t. This natural science test samples the 
subject’s knowledge of many types of natural 
phenomena. From experience one would 
expect the students of good promise to 


demonstrate superiority over the students of 
poor promise. The data in Table XVII and 
Chart 10 confirm this view. Each of the groups 
of students of good promise (1A and 1C) 
has a reliably, or approximately reliably, 
greater mastery of scientific facts than each 
of the groups of less able students (3A and 
3C), and the students of good promise who 
exceed their predicted record (1A) give some 
evidence (83 chances in 100) of superior 
knowledge over the students of good promise 
who did not make the record expected (1C). 

2. The positive deviates (A) as a group 
demonstrate somewhat greater knowledge of 
scientific facts than the total group of nega- 
tive deviates (C). Their mean is 1.73 points 
greater, and the chances of a difference be- 
tween the groups greater than zero are about 
eighty-eight on one hundred. 

This finding compares favorably with that 
of Wagner* who reported that the direction 
of difference in science knowledge is in favor 
of the positive deviates. She found a com- 
pletely reliable difference for the boys, and 
a chance of eighty-eight in one hundred that 
the positively deviating girls are superior to 
those who vary negatively from predicted 
college averages. 

This difference may be interpreted as an 
indication of the probability of a relationship 
of a student’s possession of scientific knowl- 
edge to his academic adjustment in college. 

Observation of the practices followed in 
evaluating students’ work leads to the view 
that the possession of facts is of primary 
importance and receives marked weighting 
when professors estimate marks. In this con- 
nection, reference to Chart 6 which presents 
data concerning group achievement on the 

* Wagner, loc. cit. 

















178 


JOURNAL OF EXPERIMENTAL EDUCATION 





[Vol. 7, No. 3 


TABLE XVII 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SONES—HARRY HIGH SCHOOL 
ACHIEVEMENT TEST, Form A, SECTION III—NATURAL SCIENCE 


Groups Compared Mean Mean 
of of 

Group 1 Group 2 Group 1 Group 2 
1A 1C 41.01 38. 30 
1A 3A 41.01 31.73 
1A 3C 41.01 32.10 
1C 3A 38. 30 31.73 
ic 3C 38. 30 32.10 
3A 3C 31.73 32.10 

A Cc 36. 31 34. 58 


Difference S&S. E. -* Chances 
between diff. 3..E. in 
means diff. 100 

2.71 2. 84 0.95 83 
9.28 2. 36 3.93 100 
8.91 2.61 3.41 100 
6. 57 2. 37 2.77 99.7 
6.20 2. 62 2.36 99 
0.37 2.09 —0.18 57 
1.73 1.45 1.19 88 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART /0 


Re.i‘asi.itr OF DIFFERENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 


Sones -HARRY HIGH SCHOOL ACHIEVEMENT 


Test Foam A, SECTION IIl- NATURAL SCIENCE 


TOTAL 


I1/A A 











LEGEND 
D 
Cane 
LESS THAN/ = — 
40 70 2.0 
2/4 TO 3.0 


3/ AND OVER 
GREATER THAN * 


language and literature test, will indicate also 
a tendency of the total positively deviating 
group (A) to demonstrate greater knowledge 
of factual material than the negatively devi- 
ating group as a whole (C). This difference 
is viewed as having some significance since 
it cannot be ascribed as a natural concomitant 
of intelligence, for as will be noted by refer- 
ring to Table IV, the negatively deviating 


group has an average percentile score 6.28 
points greater than the positively deviating 
group. 

Social Science—Sones—Harry High School 
Achievement Test, Form A, Section IV. 1. 
The data presented in Table XVIII and Chart 
11 show some superiority of all positive de- 
viates (A) over all negative deviates (C) in 
social science knowledge, although in no case 
is the difference between the groups com- 
pletely reliable. There are ninety-four chances 
in one hundred that the total positive deviates 
have more knowledge of this type than the 
total negative deviates. This finding compares 
favorably with that reported by Wagner’ for 
history. She found a difference favoring both 
the boys (99.8 chances in roo) and the girls 
(98.5 chances in 100) whose college averages 
exceeded those predicted for them. The dif- 
ference which she reported is somewhat more 
marked than that found in this investigation. 
This lack of agreement is probably due to the 
difference in the method of arranging the 
groups. She included in the deviating groups 
students whose obtained college grade varied 
from the mean by one or more probable errors 
of the distribution, while in the present study 
the deviates were classified as those students 
who varied by one-half of a standard devia- 
tion, or .74 of a probable error, of the distri- 
bution. Thus, Wagner’s deviating groups con- 
tained more extreme cases than the groups in 
the present investigation. Consequently, a 
greater difference in the characteristics of the 
positive (A) and negative (C) deviates is to 
be expected. 

2. There are about ninety-nine chances in 
one hundred that the good positive deviates 


* Wagner, op. cit., p. 198. 











‘eo. 2 


ited 


ing 
00l 


art 


le- 


ise 
m- 


es 
he 
es 
or 
th 
Is 











March, 1939) 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 179 


TABLE XVIII 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SONES—HARRY HIGH SCHOOL 
ACHIEVEMENT TEST, ForRM A, PART IV—SoctAL SCIENCE 


Groups Compared —— a 
0 0 

Group 1 Group 2 Group 1 Group 2 
1A 1C 63.99 55. 30 
1A 3A 63.99 45. 27 
1A 3C 63.99 41.87 
1C 3A 55. 30 45. 27 
1C 3C 55. 30 41. 87 
3A 3C 45.27 41. 87 

A Cc 52.25 48.74 





Difference S. E. D * Chances 
between diff. S. E. in 
means diff. 100 
8. 69 3. 80 2.28 89 
18. 72 3.41 5. 48 100 
21.12 3. 52 6.29 100 

10. 03 3.74 2.68 99.6 
13. 41 3. 84 3. 50 100 

3.40 3.45 0.99 99.5 
3.51 2.19 1. 60 94 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART // 


Reciasitity OF DIFFERENCES BETWEEN 
THE MEANS OF L4/RED GROUDS FOR 
SONES-HARRY HIGH SCHOOL ACHIEVEMENT 
Test Foam A, SECTIONIV- SOCIAL SCIENCE 


TOTAL 


A 





~ 
Lh 





ee 


 * 
< 


n 
~ 
. 


3 


4 
Be 











LEGEND 





Oat. 
LESS THAN / =<- — 
140 TO 2.0 
2/70 3.0 


J3./ ANO OVER 
GREATER THAN 


(1A) know more social facts than good nega- 
tive deviates (3A), and eighty-four chances 
in one hundred that the poor positive deviates 
(3A) have such knowledge superior to the 
poor negative deviates (3C). 

3. The findings also strongly indicate that 
students predicted as better than average (1A 
and 1C) have a greater mastery of social 
science information than the students of poor 


promise (3A and 3C). It will be noted that 
both the positive and the negative groups of 
good students (1A and 1C) obtained mean 
scores on the social science test of 63.99 and 
55-20, respectively, while the poor negative 
deviates (3C) received an average score of 
41.87, and the poor positive deviates (3A) 
averaged 45.27. 

These observations tend to confirm the view 
expressed in connection with natural science, 
namely, that there is some likelihood that a 
knowledge of facts tends to contribute to a 
student’s academic adjustment in college, at 
least in so far as adjustment is measured by 
professors’ marks. 

General Academic Achievement—Sones— 
Harry High School Achievement Test, Form 
A—(Total Score). 1. Differences between 
the total scores made by the various groups 
on the Sones—Harry test are given in Table 
XIX and Chart 12. When Chart 12 is com- 
pared with Charts, 6, 7, 8, 10, and 11, which 
present graphically the differences in achieve- 
ment on the tests of knowledge of subject 
matter primarily, a marked similarity between 
all the charts is noted. This likeness was to 
be expected because of the relationship of the 
sub-divisions of the Sones—Harry test to the 
total score of the test, and because of the 
measurement of like or similar achievements 
by the mathematical reasoning section of the 
Progressive Mathematics test and the mathe- 
matics section of the Sones—Harry test. The 
total score of the Sones—Harry test may there- 
fore be presented as a composite measure of 
the attainments of groups in knowledge fields. 

2. Above average groups (1A and 1C) are 
markedly superior to the below average groups 
(3A and 3C) in scholastic achievements as 
measured by the Sones—Harry test. Table 





—-s 











— 


I ee 


180 JOURNAL OF EXPERIMENTAL EDUCATION 





[Vol. 7, No. 3 


TABLE XIX 
DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SONES—HARRY HIGH SCHOOL 


ACHIEVEMENT TEsT, FORM 


Groups Compared Mean Mean 
of of 

Group 1 Group 2 Group 1 Group 2 
1A 1C 231.18 207. 07 
1A 3A 231.18 151.22 
1A 3C 231.18 155. 29 
1C 3A 207. 07 151. 22 
1C 3C 207. 07 155. 29 
3A 3C 151. 22 155. 29 

A Cc 187.14 180. 60 


A, (TOTAL Score) 





Difference 5S. E. > * Chances 
between diff. S_E. in 
means diff. 100 

24.11 10. 43 2.31 98.9 
79. 96 8. 67 9. 23 100 
75. 89 9.12 8. 32 100 
55. 85 9. 52 5. 87 100 
51.78 9.93 5. 21 100 
4. 07 8. 06 —0. 51 69 
6. 54 6.45 1.01 84 


* The first pond aap ee the trait to santas degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART /2 


Reiiagi.ity oF Olererences BETWEEN 
Tae MEANS OF FAAIRES CROUDS FOR 
SONES- HARRY HIGH StHock ACHIEVEMENT 
Test, Form A (ToraL) 











x —S 





LEGEND 


Can 
LESS THAN/ =< — 
40 702.0 -——s 
Pye OF ° OR eta 
I/ AND OVER 
GREATEQ THAN 


XIX shows the averages of 231.18 and 207.07 
for the former, and 151.22 and 155.29 for 
the latter. 

3. Good positive deviates (1A) are in- 
dicated as having more knowledge than good 
negative deviates (3A) with a nearly reliable 
difference (98.9 chances in 100) between the 
means of the two groups of 24.11. 

4. There is no significant difference be- 
tween the positive and negative deviates 





among the students of poor promise (3A and 
3C). 

5. A probable superiority of all positive 
deviates (A) over the total negative deviates 
(C) is indicated (84 chances in 100). There 
is a difference between their means of 6.54. 
This difference may be interpreted as signi- 
fying that subject matter knowledge probably 
influences students to do work in college bet- 
ter than that predicted for them. However, 
when Table XIX and Chart 12 are compared 
with Table XX and Chart 13, which present 
the differences between the groups on their 
original high school records, this view seems 
hardly tenable, for a striking similarity of the 
difference is noted. The groups between which 
the differences are completely reliable, and 
those which are nearly reliable, correspond 
exactly. Moreover, while on the Sones—Harry 
test the poor negative deviates (3C) are in- 
dicated as excelling the poor positive deviates 
(3A) with approximately sixty-nine chances 
in one hundred of there being a difference 
greater than zero, in terms of high school 
record the chances are eighty-four in one hun- 
dred that this is true. Furthermore, on the 
Sones—Harry test the total group of positive 
deviates (A) is indicated as superior to the 
total group of negative deviates (C), by 
eighty-four chances in one hundred, while the 
high school record shows seventy-nine chances 
in one hundred that the same is true. On the 
basis of these comparisons it seems more 
justifiable to postulate the view that the dif- 
ferences that are evidenced are more a func- 
tion of the procedure followed i-. the original 
organization of the groups than they are char- 
acteristics which influence variations in pre- 
dicted college performance. 





‘7 we ff? 








UNPREDICTED SCHOLASTIC ACHIEVEMENT 181 


March, 1939| 
TABLE XX 
DIFFERENCES BETWEEN THE MEANS OF THHE SuB-GROUPS ON HIGH SCHOOL AVERAGES 

Groups Compared Mean Mean Difference S. E. m* Chances 

of of between diff. S.E in 

Group 1 Group 2 Group1 Group 2 means "diff. 100 
1A 1C 3.49 3.28 0.21 0.08  & 99.7 

1A 3A 3.49 2.16 1.33 0.08 17.15 100 

1A 3C 3.49 2.23 1. 26 0.07 18. 73 100 

1C 3A 3.28 2.16 1.12 0. 08 14. 47 100 

1C 3C 3.28 2.23 1. 05 0. 07 15. 65 100 

3A 3C 2.16 2.23 0. 07 0.07 —1. 02 84 

A Cc 2.78 2.93 0. 06 0. 07 0.81 79 


* The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART /Z 


Leiagiity OF DIFFERENCES BETWEEN 
tHE MEANS OF FIRED GROULB FOR 
hign ScHoo. AVERAGES 














TOTAL 
3A < ee: te 
mo 
A t 
1} 
j a 
| ee | 1275 
7 
, ' a: 
4 j 
, TOTAL 
me &% AG & 
LEGENC 
CaF 
LESS THAN: =~ — 
40 7020 ----- 
2/7030 (J 


3/ ANDO OVER 
GREATER THAN 
II. Non-AcaDEMIC MEASURES 


Home Adjustment—Bell Adjustment In- 
ventory. 1. The findings of this investigation 
reveal no completely reliable differences be- 
tween any of the groups on the home adjust- 
ment section of the Bell Adjustment Inven- 
tory (Table XXI and Chart 14). However, 
one striking fact is indicated, namely, that 
with nearly complete reliability the poor nega- 
tive deviates (3C) are superior in their home 
adjustment to any of the other deviating 


groups. The differences between the mean of 
this group and of each of the other three 
groups when divided by their respective 
standard errors are 2.29, 2.75, and 2.44, a 
result which indicates about ninety-nine 
chances in one hundred that the differences 
are greater than zero. 

2. There is some suggestion that the good 
positive deviates (1A), experience better (83 
chances in 100) home adjustment than the 
good negative deviates (1C). 

3. The negative deviates as a whole (C) 
tend to be better adjusted than the positive 
deviates (A) although the data are not con- 
clusive in this regard (80 chances in 100). 

These observations lead to the inference 
that adjustment to home conditions influences 
variation from predicted performance, especi- 
ally for the students of poor promise. The 
data indicate that students with inferior home 
adjustment, as measured by this test tend to 
improve the quality of their scholastic attain- 
ment, while those with better home adjust- 
ment tend to take things easier in college, or 
at least accomplish less than expected in terms 
of their aptitude and previous achievement 
in school. 

Numerous investigations have been made to 
discover the relationship between home con- 
ditions and academic achievement. In a sur- 
vey of the literature reporting some of the 
major studies in this connection Sarbaugh’® 
points out the conflicts in the different find- 
ings with reference to the importance of vari- 
ous influences in determining the nature of an 
environment favorable to academic achieve- 
ment. In her own investigation™’ of high 


Mary E. Sarbaugh, “Effect of Home Surroundings on 
Academic Achievement,”’ Studies in Articulation of High 
School and College (University of Buffalo Studies, XII; 
Buffalo, N. Y.: University of Buffalo, 1936), pp. 245-276 


% [bid., pp. 275-276. 














182 JOURNAL OF EXPERIMENTAL EDUCATION 





[ Vol. 7> No. ; 


TABLE XXI 


DIFFERENCES BETWEEN THE MEANS* OF PAIRED GROUPS ON THE BELL ADJUSTMENT INVENTORY. 
PART A—HOME ADJUSTMENT 


Groups Compared Mean os 
of 0 
Group 1 Group 2 Group 1 Group 2 

1A 1C 8.16 9.47 
1A 3A 8.16 8.45 
1A 3C 8.16 5. 56 
1C 3A 9.47 8.45 
1C 3C 9. 47 5. 56 
3A 3C 8.45 5. 56 

A Cc 8.00 7.36 


* A low score signifies good adjustment. 


Difference S. E. D *t Chances 
between diff. S.E. in 
means diff. 100 
1.31 1.40 0. 94 83 
0. 29 1.16 0.25 60 
2. 60 1.14 —2. 29 98.9 
1. 02 1.44 —0.71 76 
3.91 1.42 —2. 75 99.7 
2.89 1.18 —2.44 99.3 
0. 64 0.75 —0. 85 80 


t The first op pee the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART /4 


ReiiAgbitity OF DIFFERENCES BETWEEN 
THE MEANS OF FAIRED GROUPS FOR 
BELL AQUUSTMENT ‘INVENTORY 

HIOME ADJUSTMENT 


TOTAL 
3A <s— lA A 
‘ 71 
A 
4. ’ Phd | ' 
/ 7 
1 eee Fi | | 
a 4, 
1] we 
| ] 4 4 
4 
U % 
— Io TOTAL 
=. taamee iS 
LEGEND 


CaF 
LESS THAN( =< — 
40 70 2.0 -<—<-—- 


2/70 70 — 
3./ AND OVER 
GREATER THAN 
school seniors she found some differences in 
the family backgrounds favorable to the supe- 
rior group of children. However, they were 
not reported as reliably significant. 
Harris** in his survey of the literature con- 
cerning investigations of the relation of factors 
associated with the home and academic adjust- 


ments reported conflicting findings. In his 


™“ Daniel Harris, “The Relation to College Grades of Some 
Factors Other Than Intelligence.” Archives of Psychology, 
XX, No. 131 (July, 1931), 13-14. 





own investigation’* of college freshmen he 
found no reliable relationships between home 
surroundings and scholastic success. 

Shuttleworth,’* in a study of college fresh- 
men, found a slight degree of relationship be- 
tween favorable intellectual and cultural home 
backgrounds and scholastic success. On the 
other hand Wagner’ reports the opposite. She 
found more favorable home backgrounds for 
the students who did not do work of as high 
calibre as that predicted for them by their 
previous records. This conclusion agrees in 
general with the findings of this investigation, 
especially for the students of poor promise, 
although this study used the student’s check- 
list of statements describing home conditions 
rather than the actual objective reports of the 
home backgrounds which Wagner used. 

A possible interpretation of the favorable 
relationship of poorer home adjustment to 
favorable variation from predicted college 
work is given by Wagner in the following 
statement: 

Both because of their wish to obviate 
present lack of cultural aspects in their 
home environment and because of their 
greater awareness of their educational 
opportunity, these students are more likely 
to make the most of their time investment.*’ 


Health Adjustment—Bell Adjustment In- 
ventory. 1. Health adjustment does not 
appear to be a characteristic which differenti- 
ates reliably between students of good promise 
and students of poor promise, or between 
positive and negative deviates. Table XXII 
and Chart 15. 

38 Ibid. 

Shuttleworth, op. ce OP. 431-432. 


W. , op. cit., 
18 Ibid. p. 208. 








ORY 


ted 


ws SR Ow UTD 





0 A AGE ie A ibis «the 














me 





March, 1939] 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 


TABLE XXII 


DIFFERENCES BETWEEN THE MEANS* OF PAIRED GROUPS ON THE BELL ADJUSTMENT INVENTORY, 
PART B—HEALTH ADJUSTMENT 


Groups Compared — Mean 
0 of 
Group 1 Group 2 Group 1 Group 2 

1A 1C 6. 93 7. 82 
1A 3A 6.93 7.35 
1A 3C 6.93 6.28 
1C 3A 7.82 7.35 
1C 3C 7. 82 6. 28 
3A 3C 7.35 6. 28 
A Cc 7.35 6.77 


* A low score signifies good adjustment. 





Difference 5S. E. D *t Chances 
between Giff. 3 aaa in 
means diff. 100 
0.89 1.12 0.79 7 
0. 42 0.95 0. 44 66 
0. 65 1.01 —0. 65 74 
0. 47 1.12 —0. 42 65 
1. 54 1.17 —1. 32 90 
1. 07 1.01 —1. 06 85 
0. 58 0. 56 —1.03 84 


+ The first group possesses the trait to a greater degree than the second group except where indicated 
by a minus sign preceding the ratio of the difference to its standard error, when the reverse is true. 


CHART /5 


RewAagi.iTy of DIFFERENCES BETWEEN 
THE MEANS OF PAIRED GROUPS FOR 
Beit AQUUSTMENT INVENTORY - 
HEALTH ADJUSTMENT 


--- 
é 


& 
4 
N _—— — 


| a 


Q) 
ry 
' 
! 
I 
Ri 
| 
Y 
~ 
») 
-) 


LEGENO 
oD ; 
Cate 
LESS THAN/ ~=<x—- — 
40 FO 20 
2/7030 
3/ AND OVER 
GREATER THAN 


2. The data indicate a health adjustment 
of the poor negative deviates (3C) superior 
to that of the other deviating groups with 
chances in one hundred of seventy-four, 
eighty-five, and ninety for the three groups. 


3. The good positive deviates (1A) seem 
to be better adjusted (79 chances in 100) in 
the matter of health than the good negative 
deviates (1C). 


4. The health adjustment of the total 
positively deviating population (A) appears 
to be poorer (85 chances in 100) than that of 
the total negatively deviating group (C). 

5. The lack of health adjustment may be 
a factor which influences the achievement of 
a quality of college work superior to that 
expected, especially for the students of poor 
promise. The students in the best adjusted 
group appear to have made the poorest col- 
lege records. However, the findings are not 
conclusive because of the low reliabilities of 
the differences. 

Investigators report conflicting findings 
concerning the relationship between physical 
condition or the possession of physical defects 
and school marks. This conflict is shown in 
Harris”" review of the major studies that had 
been reported. In his own investigation he 
found no reliable relationship between physi- 
cal defects and scholastic attainment. 

Wagner’® in her study of the variations 
from predicted achievement of college students 
reported that the students who made records 
better than expected estimated their health as 
“average” as contrasted with the ranking of 
“above average” for the less successful. On 
the other hand, Eckert,’® in her study of 
superior and inferior college students, by the 
use of student self-judgments found no per- 
ceptible difference between the two groups in 
physical fitness. 

The variations in findings are probably due 
to the inherent weaknesses of the question- 
naire method of securing data and to the con- 


™ Harris, op. cit., pp. 12-13. 
8 W; , op. cit., p. 234. 
b * Rut E. Pckert, “Who is the Superior Student?” Studies 
in Articulation of High School and College (University of 
Por — , IX; Buffalo, N. Y.: University of Buffalo, 
, p. . 











184 


JOURNAL OF EXPERIMENTAL EDUCATION 





[Vol. 7, No. 3 


TABLE XXIII 


DIFFERENCES BETWEEN THE MEANS* OF PAIRED GROUPS ON THE BELL ADJUSTMENT INVENTORY. 
Part C—SOcIAL ADJUSTMENT 


Groups Compared Mean — 
of Oo 

Group 1 Group 2 Group 1 Group 2 
1A 1C 14. 64 13. 76 
1A 3A 14. 64 14. 00 
1A 3C 14. 64 13. 28 
1C 3A 13. 76 14. 00 
1C 3C 13. 76 13. 28 
3A 3C 14. 00 13. 28 

A Cc 14. 22 14. 03 


* A low score signifies good adjustment. 





Difference 8. E. | 3 Ts | Chances 
between diff. S. E. in 
means diff. 100 
0. 88 1. 85 —0. 48 68 
0. 64 1.88 —0. 34 64 
1. 36 1.79 —0. 76 77 
0.24 1. 82 0.13 55 
0. 48 1.72 —0. 28 61 
0.72 1.76 —0. 41 65 
0.19 1. 02 —0. 20 58 


+ The first group the trait to a greater degree than the second group except where indicated 
by a minus sign seoeolline the ratio of the difference to its standard error, when the reverse is true. 


CHART 16 


Rewiasnity of DIFFERENCES BETWEEN 
THe MEANS OF SU4iRED GROUPS FOR 
Bez AdsustTMénrT /NVENTORY 
SoctaL AQUUSTMENT 


TOTAL 
3A ee ae 
4 \ a 


te 


TOTAL 


3C 28 /C a 


L£GENO 


ta 
LESS THAN’ =< -~ 
40 70 20 


2/ 70 2.0 ies 
31 ANO OVER 
GREATER THAN 


sequent basing of conclusions upon data with 
varying degrees of reliability. 

Social Adjustment—Bell Adjustment In- 
ventory. A study of Table XXIII and Chart 
16, which present the findings concerning the 
relative social adjustment of the various 
groups, reveals no appreciable differences be- 
tween any of the groups. Apparently social 


adjustment as measured by the Bell Adjust- 
ment Inventory is not a function of either 





scholarship or of variations from predicted 
scholastic achievement. 

This finding is similar to that of Wagner*’ 
who reports no appreciable difference between 
deviating groups on self-estimates of college 
freshmen on characteristics comparable to 
those measured by the social adjustment sec- 
tion of the Bell Adjustment Inventory. 


Emotional Adjustment—Bell Adjustment 
Inventory. 1. No reliable differences between 
the various groups in the quality of emotional 
adjustment as measured by the Bell Adjust- 
ment Inventory were found (Table XXIV 
and Chart 17. 

2. There is a strong indication that, of all 
groups studied, the students of good promise 
who did not make the records predicted for 
them (1C) had the least satisfactory emo- 
tional adjustment. The chances in one hun- 
dred that the differences are greater than zero 
vary from ninety-three to ninety-six and one- 
half. Within these limitations of reliability 
the inference is made that students of good 
promise who obtain college records inferior 
to those expected (1C) are not as well ad- 
justed emotionally as those students who are 
predicted to do good work but exceed expec- 
tations (1A). Consequently, it seems probable 
that in emotional adjustment there is found 
a quality that influences activity that leads 
students of good promise to greater achieve- 
ment than expected, and in the lack of such 
adjustment there is found a quality associated 
with the failure of students of good promise 
to obtain college averages predicted for them. 
The observation that both groups of subjects 
show some superiority over the negatively 

* Wagner, op. cit., p. 229. 








Saw he 











—- 





March, 1939] 


emotional adjustments with low scholarship. 
As Wagner** suggests, the relative lack of 
socio-economic security may be serving to 
stimulate the development of secondary and 
tertiary drives in an effort to compensate. 
Conversely, good home adjustment would not 
give rise to such compensatory motivation. 


e. The poor positive deviates (3A), while 
tending to be more poorly adjusted in their 
home, health, and emotional relations than 
the less successful below-average students 
(3C), appear at the same time to be better 
adjusted in these ways than the less success- 
ful students of good promise (1C). This 
finding suggests that favorable adjustments 
of these types are associated with students 
having less academic promise at the beginning 
of their college careers (3A and 3C). 


3. A second striking observation is noted 
concerning the findings on the Bell Adjust- 
ment Inventory, namely, the similarity of the 
relationship between the paired groups on 
the home adjustment, health adjustment, and 
emotional adjustment sections of the Bell 
questionnaire. If a relationship other than 
chance is assumed, it may be inferred that 
there is an inter-relationship between the 
qualities that are measured by the three dif- 
ferent tests. A classification of types of 
adjustment such as Bell has used may be 
even more in the nature of an administrative 
device than a valid test founded on different 

™ Wagner, op. cit., p. 236. 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 


185 


discrete factors. Perry®* in his investigation 
of group factors in adjustment questionnaires 
cautions against the assignment of terms to 
indicated factors on the basis of inspection 
and rationalization. He concludes that name 
assignment to such a factor is arbitrary and 
depends on the meaning of the term as viewed 
by the person who designates it. 

Choice of Life Career at Time of College 
Entrance. For the purpose of this investiga- 
tion the only part of the vocational question- 
naire used was the question that asked 
whether or not a very definite choice of a 
life career had been made prior to registra- 
tion. A tabulation of the results and a 
graphic representation of the data are given 
in Table XXVI and Chart 18. Approxi- 
mately the same percentage of the total posi- 
tively deviating group (A) and the total 
negatively deviating group (C) had made 
their vocational decision, the percentages 
being 63.83 and 61.46, respectively. A dif- 
ference in percentage of only 1.85 was found 
between the good positive group (1A) and 
the good negative group (1C). There was a 
more marked difference between the poor 
positive group (3A) and the poor negative 
group (3C), namely, 6.58 in favor of the 
former. A noteworthy fact is the great dif- 
ference between the students of good promise 
(1A and 1C) and the students of poor prom- 
ise (3A and 3C), for 52.13 per cent of the 
total group of students of good promise had 

% Perry, op. cit., p. 79. 


TABLE XXVI 


PERCENTAGES OF POPULATION OF SuB-GrRouPS WHO HAD MADE VOCATIONAL CHOICES AT TIME 
OF ENTRANCE IN COLLEGE 


Total 
Group Number 

1A 35 
1B 41 
1C 31 
2A 34 
2B 66 
2C.. 50 
3A 42 
, . _ es" 47 
a NR: 36 
A Total positive deviates _- ; ill 
B_ Total non-deviates_____. wok 154 
C Total negative deviates __ 117 
1 Total good students. - . 107 
2 Total average students ‘ " 150 
3 Total poor students. - caldngilt t 125 

EES ee ee ee 382 


Number Number Percent 
who who who 
answered had made had made 
question choice choice 
32 16 50. 00 
35 19 54. 29 
27 14 51. 85 
29 18 62. 07 
53 26 49. 06 
40 24 60. 00 
33 26 78.79 
39 20 51. 28 
29 21 72.41 
94 60 63. 83 
127 65 51.18 
96 59 61. 46 
94 49 52.13 
122 68 55.74 
101 67 66. 34 
317 184 58. 04 











186 JOURNAL OF EXPERIMENTAL EDUCATION 


Wagner*® reported a tendency on the part 
of the positively deviating group to claim to 
suffer greater disturbance from scoldings and 
to worry less than the negatively deviating 
group. She suggests that: 

the social sensitivity evidenced by these 

more successful students plays a real part 

in keeping them on their academic job.” 


Adjustment as Measured by the Bell Ad- 
justment Inventory. 1. Most striking is the 
observation that not a single completely 
reliable difference was found between any of 
the four extreme deviating groups on any one 
of the four types of adjustment purported 
to be measured by this test. This situation 
will be readily seen by referring to Charts 14, 
15, 16, and 17. A probable interpretation is 
that no significant relationship exists between 
either the quality of academic achievement 
in college or the receiving of marks above or 
below those predicted and the adjustive char- 
acteristics measured by this inventory test. 
It would therefore appear that the Bell test 
has characteristics similar to those of other 
current adjustment questionnaires, for Perry** 
in his statistical analysis of the relationship 
between the various questionnaires and 
academic attainment found no significant 
relationships with any of them.*® Further- 
more, this finding in general agrees with that 

™* Wagner, of. cit., p. 229. 

* Loc. cit. 

* Raymond C. Perry, A Group Factor Analysis of the 
Adjustment Questionnaire (Southern California Education 
Monographs, 1933-1934 Series, No. 5; Los Angeles: Univer- 
sity of Southern California Press, 1934), p. 78. 

* The questionnaires which Perry used were the Laird Per- 
sonal Inventories B2 and C2, the Bernreuter Personal Inven- 
tory, Scales BI-N, B2-S, B3-I, and B4-D, the Allport 


Reaction Study, and the Pressey X-0 Tests—Affectivity and 
Idiosyncracy 





(Vol. 7, No. ; 


of other investigators whose reports are sum- 
marized by Toops and Kuder in their review 
of investigations concerned with the relation 
of personality to scholastic records. They 
state: “With few exceptions, personal data 
have proved to be of little use in prognosti- 
cating college achievement.’”*° 

2. The possibility of relationship between 
the various types of adjustment measured and 
the tests used are shown in the following 
(Table XXV) statements: 

a. The means of the total negative deviates 
(C) on all four tests are smaller than the 
means for the total positive deviates (A). 
This finding suggests better home, health, 
social, and emotional adjustments for the 
subjects whose college marks fell below the 
level expected (C). The chances in one hun- 
dred that there is a difference greater than 
zero are eighty, fifty-eight, eighty-four, and 
sixty-two, respectively. 

b. Except on the social adjustment test, 
the differences between the good positively 
deviating group (1A) and the good negatively 
deviating group (1C) are in favor of the 
former. This result suggests a relationship of 
high college averages to home, health, and 
emotional adjustments. 

c. The same relationship and suggestion 
are indicated for the good positive deviates 
(1A) over the poor positive (3A) deviates. 

d. The students predicted to make rela- 
tively low scholastic records and who obtained 
lower ones (3C) showed a consistent superi- 
ority over all groups on all four tests. This 
finding may indicate an association of rela- 
tively good home, social, health, and 


* Herbert T and G. Frederic Kuder, ‘Psychological 
Tests,” Review of Educational Research, V (June, 1935), 223 


TABLE XXV 


DIFFERENCES BETWEEN THE MEANS OF PAIRED GROUPS ON THE SUBDIVISIONS OF THE BELL 
ADJUSTMENT INVENTORY EXPRESSED IN NUMBER OF CHANCES IN ONE HUNDRED THAT 
DIFFERENCES ARE GREATER THAN ZERO 


Difference between Means* 





Home Social Health Emotional 
Groups adjust- adjust- adjust- adjust- 

ment ment ment ment 

Total C— 

total A 80 58 84 62 
|: aa 83 —68 79 96 
1A—S3A_...._.......... 60 —64 66 63 
3C—1A 98.9 77 74 53 
38C—1C 99.7 61 90 96 
| Sa es 99.3 65 85 65 
SS ee stectiiall 76 —55 65 93 


* Except where indicated by a minus sign the difference is in favor of the group listed first in each pair. 











March, 1939| 


needed to throw light upon the major problem 
with which this investigation has _ been 
concerned. 


I. SpeciFIcC CONCLUSIONS 


1. None of the fourteen measures used in- 
dicates reliably that students, with the excep- 
tion of those classified as superior, will attain 
scholastic records in their freshman year 
better or poorer than those predicted by a 
regression equation based on high school aver- 
ages and American Council percentile scores. 
However, there are indications that students 
in general who do better work than expected 
will (a) receive relatively low scores on the 
Shank Test of Reading Comprehension, (b) 
obtain scores on the Bell Adjustment In- 
ventory which suggest poor home, health, 
social, and emotional adjustments; and (c) 
will receive relatively high total scores on the 
Sones—Harry High School Achievement Test 
and on the social science, natural science, and 
language and literature sections of this test. 

2. Students predicted to do superior col- 
lege work (1A and 1C) may be reliably ex- 
pected to make academic college records bet- 
ter than their high school records if they test 
high on the mathematics section of the Sones— 
Harry High School Achievement Test. 

Positive deviation of students of good 
promise (1A) is further indicated by (a) 
relatively good home, health, and emotional 
adjustments, (b) by high scores on the 
Sones—Harry High School Achievement test, 
on the social science section of that test, on 
both the reasoning and computation sections 
of the Progressive Mathematics Tests, and 
on the Barrett-Ryan English Test, and to 
a lesser extent by (c) high scores on the 
Shank Test of Reading Comprehension, on 
the natural science and the language and 
literature sections of the Sones—Harry High 
School Achievement Test, and by a relatively 
poor social adjustment, as measured by the 
Bell Adjustment Inventory. 

It was found that all differences between 
the groups on all measures, with the excep- 
tion of social adjustment, were in favor of 
the students of good promise who exceeded 
expectancy (1A). 

3. No reliably signficant difference was 
shown on any of the measures used between 
the two deviating groups of the students pre- 
dicted to be less successful in college (3A and 
3C). 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 


187 


The less successful group of students of 
poor promise (3C) were indicated as superior 
to similar students who were more successful 
(3A) by higher scores on all academic meas- 
ures excepting social science. 

They were also favored on all measures of 
adjustment. This finding is almost the reverse 
of the finding for the deviating groups of 
students of better promise (1A and 1C). 

A larger percentage of the more successful 
students of poor promise (3A) had made a 
vocational choice before entering college than 
the less successful students of poor promise 
(3C), whereas there was very little difference 
between the deviating groups of the students 
of good promise (1A and 1C\). 

4. The more successful group of students 
of good promise (1A) was shown to be 
reliably superior to the more successful group 
of students of poor promise (3A) on all 
academic measures. 

No reliable differences between these groups 
were obtained on the measures of adjust- 
ment. In all types of adjustment, excepting 
social adjustment, there was an indication of 
superiority in favor of the students of good 
promise (1A). 

The percentage of students of poor promise 
(3A) who had made a choice of life-career 
was far larger than the percentage of students 
of good promise (1A). 

5. Completely reliable superiority of the 
good negative deviates (1C) over the poor 
negative deviates (3C) was found on the 
measures of high school achievement, language 
and literature, social science, reading com- 
prehension, and the mechanics of English. 

Nearly reliable differences in favor of the 
students of good promise who did not obtain 
the college averages predicted (1C) were 
shown for the measures of natural science 
information and mathematical ability. 

While no reliable differences were indicated 
between the two groups of less successful 
deviates (1C and 3C) on the adjustment 
measures, suggestions of differences on all 
measures in favor of the less successful 
students of poor promise (3C) were shown. 

A much larger percentage of the students 
of poor promise who had not made the col- 
lege averages expected (3C) had made their 
occupational choice than had the students of 
good promise whose college records did not 
come up to those predicted (1C). This com- 
parison reflects a condition similar to that 














188 JOURNAL OF EXPERIMENTAL EDUCATION 


CHART /8 . 


PeacenTaGe OF GRoues WHo Hap Mave 
Cwoice OF VOCATION AT TIME OF 
ENTRANCE TO COLLEGE 


f 











UNS OBMERRRELA SE 





i i 





O 10 20 30 #60 30 60 70 8 
PER CENTAGE 


made a definite vocational choice in contrast 
to 66.34 per cent of the total group of stu- 
dents of poor promise. 

The findings suggest that the fact that a 
student has indicated that he has made a 
definite vocational decision before beginning 
his college program does not account to any 
appreciable degree for work inferior or supe- 
rior to that predicted for him. The chances 
are about even that if a student obtains a 
college record as predicted he will have made 
a vocational choice, and if he is a student of 
poor promise, the chances are appreciably 
greater (about 10 in 100) that his choice 
will have been made than if he were a student 
of good promise. Vocational decision appears, 
therefore, to be more closely, although in- 
versely, related to the quality of scholarship 
than it is to variation of the achieved record 
from the predicted record. 

Investigators report conflicting findings con- 
cerning the relationship between occupational 
decision and the quality of scholastic work. 
Harris found no statistically reliable rela- 
tionship, and in his survey of literature he 
presents the relation of occupational choice 
and school marks as reported by four in- 
vestigators. He states: 


(Vol. 7, No. ; 


According to Lloyd—Jones, superior sty- 
dents tend to have a more definite idea of 
why they came to college and what they 
want to be than the average student. Craw- 
ford also finds higher grades associated 
with definiteness of occupational choice, 
(but not with the knowledge of a definite 
position awaiting one after graduation; or 
with unhampered choice of one’s own occu- 
pation). On the other hand Kefauver and 
Shuttleworth did not find that those with 
definite occupational choices did better 
than those without. Shuttleworth also re- 
ports no relationship between reasons given 
for coming to college and grades.** 


Achilles** in his study of 4,527 under- 
graduates in fifty colleges reported forty-one 
per cent of the “decided” group as above 
average in scholarship, and only seven per 
cent below, while but twenty per cent of the 
“undecided” group were above average and 
fourteen per cent were below. Marshall’ 
reported similar conclusions from his study 
of ninety-one college seniors. He found that 
the “decided” group averaged higher than 
the “undecided” group by four-tenths of a 
grade mark in the freshman year and about 
one-fourth of a grade mark in the first three 
years of college. On the other hand, Wagner** 
discovered no effect of the time of occupa- 
tional decision and college success. 

The findings of this investigation are in 
agreement with Wagner in that vocational 
choice appears to be unrelated to the receiving 
of better or poorer college averages than those 
predicted. It also is in apparent agreement 
with those investigators who found no signi- 
ficant relationship between life-career decision 
and good scholarship. 


CHAPTER V 
CONCLUSIONS 


In this chapter conclusions concerning the 
findings of the investigation will be presented 
under three major headings, namely, specific 
conclusions, general conclusions, and sugges- 
tions concerning further experimentation 

% Harris, op. cit., pp. 10-11. 

™ Paul S. Achilles, “Vocational Motives in College, Extent 
and Significance of Career Decisions,” Occupations, XIII 
(April, 1935), 624-628. 

33M. V. Marshall, “Life-Career Motive and Its Effect on 
College Work,” Journal of Educational Research, XXI1\ 
(April, 1936), 596-598. 

% Wagner, of. cit., p. 223. 














March, 1939] 


social, and emotional adjustments may serve 
as incentives for students to do better work 
in college than expected, perhaps by way of 
compensation for feelings of inferiority in 
these phases of living. However, exception 
must be made in the case of the superior and 
most successful group (1A) for whom good 
adjustments appear to be either an incentive 
or a concomitant of improved attainment. 

Because of the similarity of the relation- 
ships between all groups on the home, health, 
and emotional adjustment sections of the Bell 
Adjustment Inventory, the use of these ad- 
justment categories may be questioned, for 
the designations may be more in the nature 
of arbitrarily assigned categories than repre- 
sentative of discrete factors in personality 
adjustment. 

14. The attainment of college averages 
either superior or inferior to those predicted 
was shown to have practically no relationship 
to definite choices of like work by students 
before the beginning of their college program. 

There was some indication that a voca- 
tional decision served as a stimulus to stu- 
dents of poor promise (3A) to improve upon 
the expected quality of their work. 

The choice of a life career is more typically 
a characteristic of students of poor promise 
(3A and 3C) than of students of good promise 
(1A and 1C). 


II. GENERAL CONCLUSIONS 


On the basis of the findings of this in- 
vestigation these general conclusions have 
been formulated: 

1. The use of high school averages uncom- 
bined in a regression equation with American 
Council percentile scores will serve most 
practical purposes. The increased labor in- 
volved in the use of combined criteria seems 
disproportionate to the increase in the ac- 
curacy of prediction, especially when the 
probable values inherent in the use of 
predictive measures are considered. 

2. The classification of deviating groups of 
students in accordance with their good or poor 
promise as college students appears to have 
been justified, for reliable differences on 
various measures were shown between such 
groups although none were shown when all 
students who exceeded expectancy (1A and 
3A) were compared with the total group of 
students whose college averages were actually 
below those predicted for them (1C and 3C). 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 


189 


3. Measures of academic achievement and 
of non-academic adjustments at the time cf 
entrance to college, on the whole, render rela- 
tively little assistance in predicting whether 
students’ scholastic adjustments in college will 
exceed or fall below expectancy. However, 
when they are administered with discretion 
their use is probably justified because they 
help in a small way to provide administra- 
tors, personnel workers, and instructors with 
additional knowledge and insights concerning 
the students with whom they are associated. 


4. While the statistical treatment of group 
data concerning entering college students has 
certain values, as suggested in the conclusions 
above, these values are definitely limited, 
especially in relation to the individual student. 
Great reliance upon statistical findings may 
lead to a failure to view each student as a 
unique personality worthy of individual and 
special consideration. From this point of 
view, specific analysis of all measures for a 
particular student may provide valuable in- 
formation and insights concerning his nature, 
achievements, potentialities, and needs. As 
materials for individual student conferences, 
the measures used appear to be most useful. 
Many of the values inherent in them would 
be lost if they were treated statistically and 
if the findings were used only for general 
administrative purposes. The data of this 
investigation (Chart 19) show that some of 
the most potentially able students, as indi- 
cated by all the measures applied, actually 
make unexpectedly and markedly inferior 
records in college. The data also show that 
some of the students whom the criteria dis- 
tinquish as of markedly inferior promise, and 
whose previous accomplishments and adjust- 
ments are in agreement, actually attain college 
records higher than those even of students of 
superior promise who receive averages higher 
than those expected. It is these observations 
that lead to the emphasis upon the need (a) 
to view the individual student as a distinct 
person, (b) to work with him with all avail- 
able knowledge, and (c) to avoid the smug 
feeling of contentment that comes with 
dependence upon the general conclusions of 
statistical analyses, which, as this study has 
indicated, obscure important data regarding 
the individual. There is obviously no such 
thing as a generalized person, however reliable 
the ratios of the differences between the means 











190 


indicated for the good and poor groups of 
students whose attainment exceeded that 
expected (1A and 3A). 

6. The students of good promise who 
obtained better college averages than pre- 
dicted (1A) were found reliably to excel the 
less successful students of poor promise (3C) 
on all measures of academic ability. 

Between these two groups no reliable differ- 
ence were indicated for the measure of adjust- 
ment. However, all differences were in favor 
of the less promising group of students. 

The group of students of poor promise 
(3C) also showed a greater percentage who 
had made their vocational decision. 

7. When the students of good promise who 
did not achieve predicted averages (1C) were 
compared with the students of poor promise 
who exceeded expectancy (3A) it was found 
that the former were superior to the latter 
on all academic measures. These differences 
were completely reliable on all the tests ex- 
cepting those dealing with natural science, 
social science, and mathematical computation. 
On these Jast three measures the differences 
were nearly reliable. 

No reliable differences on adjustment 
measures were obtained. However, there were 
indications of better home, health, and emo- 
tional adjustment for the students of poor 
promise who obtained college averages better 
than those predicted for them (3A). 

Choices of life-careers had been made by 
an appreciably larger percentage of the stu- 
dents of poor promise (3A and 3C) than of 
the students of good promise (1A and 1C). 
This difference conforms to the findings in all 
comparisons of deviating groups of students 
of poor promise (3A and 3C) with deviating 
groups of students of good promise (1A and 
1C), regardless of whether the variation was 
positive or negative. 

8. Students who have been predicted to do 
good work (1A and 1C) at the time of col- 
lege entrance, demonstrate a better mastery 
of the mechanics of English usage in writing 
than students who have been predicted to do 
less satisfactory work (3A and 3C). Students 
of good promise with relatively high ability 
in written language may be expected to attain 
better scholastic adjustment in college than 
expected, and students of good promise whose 
measures of English usage appear relatively 
low in the distribution may be expected to 
obtain college averages lower than those pre- 





JOURNAL OF EXPERIMENTAL EDUCATION 





[Vol. 7, No. 3 


dicted for them. Therefore, for a student of 
good promise, English usage, as measured by 
the Barrett—Ryan test, appears to be a quality 
that is either related to the improvement oj 
academic attainment, or exists as a concurrent 
factor with it. 

g. Relatively high ability to read compre- 
hendingly is a characteristic which differenti- 
ates students of good promise (1A and 1() 
from students of poor promise (3A and (3C) 
at the beginning of their college careers. How- 
ever, it does not indicate reliably a relation- 
ship between the predicted college averages 
and those actually recorded. 

10. Students of good promise (1A and 1C) 
possess superior ability in both the reasoning 
and computation aspects of mathematics. 
However, this investigation does not show 
whether it is the mastery of mathematics, 
or the possession of those qualities which 
facilitate such mastery, that differentiates the 
superior student from the student of lesser 
achievement (3A and 3C). Neither do the 
findings indicate reliably that mathematical 
ability serves to infiuence students to do either 
better or poorer work in college than predicted 
from their previous scholastic records and 
American Council percentile scores. 

11. Superior knowledge of both natural 
science and social science is indicated, but not 
with complete reliability, as both a charac- 
teristic of students of good promise (1A and 
1C), and of students whose standard of per- 
formance in college is better than expected 
(1A and 3A). 

12. Information concerning literary, social, 
and scientific facts, as measured by the tests 
used, appears to be associated with scholastic 
achievement. There are some indications that 
such knowledge contributes to the attainment 
of a better scholastic adjustment than ex- 
pected, one which is not accounted for by 
superior intelligence. 

13. No reliable difference was shown 
between students of good promise (1A and 
1C) and students of poor promise (3A and 
3C), and no significant relationship between 
college records obtained and college records 
predicted were found by the use of any of 
the measures of adjustment. 

The fact that the means on all four tests 
were greater for the total positively deviating 
group (A) than for the total negatively de- 
viating group (C) indicates, although not 
reliably, that relatively poor home, health, 








March, 1939] 


UNPREDICTED SCHOLASTIC ACHIEVEMENT IQgI 


TABLE XXIV 


DIFFERENCES BETWEEN THE MEANS* OF PAIRED GROUPS ON THE BELL ADJUSTMENT INVENTORY, 
PART D—EMOTIONAL ADJUSTMENT 


Groups Compared —_ oe 
o o 

Group 1 Group 2 Group 1 Group 2 
1A 1C 10.04 13.31 
1A 3A 10. 04 10. 55 
1A 3C 10. 04 9.92 
1C 3A 13.31 10. 55 
1C 3C 13.31 9.92 
3A 3C 10. 55 9. 92 
A C 10. 87 10. 59 


* A low score signifies good adjustment. 
+ The first group 
by a minus sign pre 


CHART /7 


RewiAgititTY OF DIFFERENCES BETWEEN 
THE MEANS OF PUIRED GROUDS FOR 
Bei AOUUSTMENT /NVENTORY - 
EMOTIONAL ADJUSTMENT 


TOTAL 
3A > /A A 
\ 
A \ # | ! A 
\ 4 
42 \ °8 | 32 
| NA err 
\ i 
Mis 
Na Vv 
106 TOTAL 
i --~*-~-] >/c C¢ 
LEGENO 
Can 
LESS THAN | =< — 
10 7020 -—---- 
2/1 703.0 C_ 
34 AND OVER — 
GREATER THAN < 


deviating students of good promise (1C) 
again suggests the possibility that variations 
from predicted achievement which result in 
improved scholastic records may be associated 
with emotional adjustment. However, some 
doubt is cast upon this interpretation when it 
is noted that there is only a slight difference 
(65 chances in 100) between the positive and 
negative groups of students of poor promise 
(3A and 3C). 





Difference S. E. D *t Chances 
between diff. .E. in 
means diff. 100 
3.27 1. 84 1.77 96 
0. 51 1. 54 0. 33 63 
0.12 1. 42 —0. 08 53 
2.76 1.92 —1.44 93 
3.39 1, 82 —1. 86 96 
9. 63 1. 52 —0. 42 65 
0. 28 0.92 —0. 32 62 


es the trait to a greater degree than the second group except where indicated 
ing the ratio of the difference to its standard error, when the reverse is true. 


The findings of the present study conform 
with those of previous investigations, insofar 
as comparisons can be made within the limita- 
tions of the measures and procedures used by 
the various investigators. Stagner,”’ in his 
review of forty-five investigations, reported 
almost uniformly low, zero, or slightly nega- 
tive correlations between favorable person- 
ality qualities and school achievement. Pint- 
ner,** in his survey of investigations found 
the same thing. In Stagner’s study, which in- 
volved seven different tests, he reported .15 
as the highest correlation obtained. His con- 
clusions are pertinent. They are: 


1. Linear correlations of intelligence, 
achievement and personality measures are 
low and are probably so as a result of the 
inherent nature of the relationship. 


2. Extreme personality trends seem to 
counterbalance advantages in aptitude, 
making for equal achievement in opposed 
groups. High emotionality and high self- 
sufficiency lead to lower achievement than 
would be predicted from __ intelligence 
scores.”* 


Harris** reported conflicting findings by 
various investigators. In his investigation he 
found that extroversion and a feeling that one 
is handicapped, as measured by pencil-and- 
paper tests, characterized students who re- 
ceived lower grades than those predicted by 
their scores on the Alpha test.*? 


21 Ross Stagner, ‘‘The Relation of Personality to Academic 
se and Achievement,”’ Journal of Educational Research, 
VI (May, 1933), 648-660. 


“Rudolph Pintner, “Intelligence Tests,”’ 
Bulletin, I, No. 7 (July, 1935), 453-472. 


3 Stagner, op. cit., p. 655. 
* Harris, op. cit., pp. 7-8. 
% Ibid., p. 48. 


Psychological 









































192 
CHART /9 
Means AND RANGES OF SUBGROUPS ON 
DISTRIBUTION OF OBTAINED COLLEGE 
AVERAGES AND PREDICTED AVERAGES 
| 
/A | | _ a 
| | | Sia | 
1B lA 7 | 
, & — 2 1) “3 | 
q | | Lesead 
2 
32A 237} 1k a 
$s a a | 
28 Sons 
82C “7) | 
iis 34 
JA 8 eS 
38 <a gm | 
(GE iB 
SC CJ | | 
| 
| | | | | 
-F ro) s 70 “@F 20 25 30 


Grace Ponr Avenace 


Co) Runoe of Oaraneo COLLeot AvEaaGes 
gaeaanaas 


Se 


4 Mean 


of groups of individuals on any measure to 
their standard errors may be. 

5. The reliable findings, together with less 
reliable indications with regard to the rela- 
tionship of measures to unpredicted scholastic 
records, although limited, may possibly con- 
tribute in some small measure to the formula- 
tion of new or modified administrative and 
personnel procedures, to the development of 
better adapted curricula, to a new synthesis 
of educational principles, to a quickened 
awareness of the unique quality of student 
personality, and to a stronger determination 
to increase the favorable influences and de- 
crease the unfavorable influences which affect 
growing and developing personality, in so far 
as the power lies within the province and 
limitations of the administrator, personnel 
worker, and teacher to accomplish these ends. 


III. Succestions For FURTHER 
INVESTIGATION 


1. One of the most notable observations 
concerning the characteristics of the different 
groups is the apparent paradox shown by the 
superiority on many of the measures of the 
students of poor promise who accomplished 
less than expected (3C) over other students 


JOURNAL OF EXPERIMENTAL EDUCATION 





[Vol. 7> No. 3 


of poor promise who obtained better records 
than expected (3A). This superiority was 
either reliably indicated or found with chances 
greater than eighty in one hundred for meas. 
ures of predicted college average, mechanics 
of written English, reading comprehension, 
general mathematical information and mathe- 
matical reasoning, home adjustment, and 
health adjustment. Need for more informa- 
tion concerning this group (3C) is further 
suggested by the fact that the average intel- 
ligence of its members as measured by the 
American Council test, is reliably superior to 
that of the group of students of poor promise 
who did work superior to that predicted for 
them (3A). Experimentation in this connec- 
tion should probably center upon the types 
and quality of motivation, upon the nature 
of the environmental influences playing upon 
students in college, outside of college, and 
before college entrance, and upon the nature 
and type of the personal limitations and 
adaptations of individual students rated as 
having poor promise. 


2. Knowledge of the characteristics of the 
students’ native endowments and of their 
adjustments and achievements made before 
entrance to college is valuable and necessary. 
The same is true concerning the bearing that 
these may have upon the quality of their 
academic adjustment in college. While this 
investigation has been carried on in the hope 
of increasing information of this sort, at no 
time has the assumption been held that aca- 
demic adjustment as defined herein is entirely 
desirable. Further investigations are needed 
to find answers to such questions as: Just 
what is it that students adjust to when they 
are awarded the symbols of such adjustment 
in professors’ marks, or perchance in an A.B. 
degree, cum laude? Are the values represented 
by the symbols the most important values in 
terms of the potential and essential contribu- 
tions of higher education in a democracy? In 
terms of individual and social well-being are 
not materials other than those traditionally 
offered more valuable? In fact, may not 
adjustment to what exists in college be 
representative of a handicap more than a help 
from the viewpoint of long-time individual and 
social values? Are there not artificialities, 
vested interest, idiosyncracies of institutional 
traditions and faculty personalities, and other 
negative factors which enter into the admin- 
istration of symbols of academic adjustment 

















March, 1939] 


that have no legitimate place in an educa- 
tional institution? What are the valid pur- 
poses of higher education in a democracy, 
and how can student progress in achieving 
these objectives be measured? 


3. Further experimentation is needed con- 
cerning services, procedures, philosophies, and 
materials which may be introduced into the 
college environment which will take students, 
with the endowments, accomplishments, and 
potentialities with which they enter college, 
and cause them actually to rise superior to 
all of the prediction formulae thus far 
developed. For example, what types of 
remedial work in the fields of knowledge or 
skill can be provided with profit? What 
methods shall be followed, and what is their 
relative effectiveness? What adaptations of 
curricula to student need can be made, and 
what are student needs? What guidance serv- 
ices will render the most valuable assistance 
to students, and how can curriculum and 
guidance be made more nearly functions of 
the same thing? How can a sound set of 
objectives for a college be formulated, and 
how can the objectives be made a part of the 
nervous systems of administrators, personnel 
workers, and teachers? Further information 
on all these questions is needed if the 
learning environments and the quality of stu- 
dent adjustment in college are to be 
improved. 


BIBLIOGRAPHY 


Achilles, Paul S., “Vocational Motives in Col- 
lege, Extent and Significance of Career De- 
cisions,” Occupations, 13:624-—628, April, 
1935. 

Brammel, P. Roy, “Articulation of High 
School and College,” The Reorganization 
of Secondary Education. Office of Educa- 
tion Bulletin No. 17, 1932, National Sur- 
vey of Education Monograph No. 10; 
Washington, D. C.: Office of Education, 
1933. 99 PP- 

Crawford, A. B., “Forecasting Freshman 
Achievement,” School and Society, 31:125- 
132, January 25, 1930. 

Douglass, Harl R., The Relation of High 
School Preparation and Certain Factors to 
Academic Success at the University of 
Oregon. University of Oregon Publications, 
III, No. 1; Eugene, Oregon: University of 
Oregon, September, 1931. 61 pp. 


UNPREDICTED SCHOLASTIC ACHIEVEMENT 


193 


Eckert, Ruth, “Who is the Superior Student?” 
Studies in Articulation of High School and 
College. University of Buffalo Studies, 
g:11-50; Buffalo, N. Y.: University of 
Buffalo, 1934. 

Frasier, George W., and J. D. Heilman, “Ex- 
periments in Teacher—College Administra- 
tion, III: Intelligence Tests,” Education 
Administration and Supervision, 15:268— 
278, April, 1928. 

Garrett, Henry E., Statistics in Psychology 
and Education. New York: Longmans, 
Green and Company, 1926. 317 pp. 

Harris, Daniel, “The Relation to College 
Grades of Some Factors Other than Intel- 
ligence,” Archives of Psychology, XX, No. 
131, July, 1931. 55 pp. 

Holzinger, Karl J., Statistical Methods for 
Students in Education. Chicago: Ginn and 
Company, 1928. 372 pp. 

Marshall, M. V., “Life-Career Motive and Its 
Effect on College Work,” Journal of Edu- 
cational Research, 29:596—598, April, 1936. 

Odell, Charles W., Predicting the Scholastic 
Success of College Freshmen. Bureau of 
Educational Research, Bulletin No. 37. 
Urbana, Illinois: University of Illinois, 
September 13, 1927. 54 pp. 

Perry, Raymond C., A Group Factor Anal- 
ysis of the Adjustment Questionnaire. 
Southern California Education Mono- 
graphs, 1933-1934 Series, No. 5; Los 
Angeles: University of Southern Califor- 
nia, 1934. 88 pp. 

Pintner, Rudolph, “Intelligence Tests,” Psy- 
chological Bulletin, 32:453-472, July, 1935. 

Sarbaugh, Mary E., “Effect of Home Sur- 
roundings on Academic Achievement,” 
Studies in Articulation of High School and 
College. University of Buffalo Studies, 
13:245-276; Buffalo, N. Y.: University of 
Buffalo, 1936. 

Segel, David, Prediction of Success in Col- 
lege. Office of Education Bulletin No. 15, 
pp. 52-56. Washington, D. C.: Office of 
Education, 1934. 98 pp. 

Shuttleworth, Frank K., “Environmental and 
Character Factors Involved in Scholastic 
Success,” Journal of Educational Psy- 
chology, 20:424—434, September, 1929. 

Stagner, Ross, “The Relation of Personality 
to Academic Aptitude and Achievement,” 
Journal of Educational Research, 26:648- 
660, May, 1933. 








194 


Symonds, Percival, Measurement in Second- 
ary Education. New York: The Macmil- 
lan Company, 1928. 588 pp. 

Toops, Herbert, and G. Frederic Kuder, “Psy- 
chological Tests,” Review of Educational 
Research, 5:217, June, 1935. 

Wagner, Mazie Earle, “A Survey of the Liter- 
ature on College Performance Prediction,” 








JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 7, No. 3 


Studies in Articulation of High School and 
College. University of Buffalo Studies, 
9:194-209. Buffalo, N. Y.: University of 
Buffalo, 1934. 

, “Studies in Academic Motivation,” 
Studies in Articulation of High School and 
College. University of Buffalo Studies, 
13:181-242. Buffalo, N. Y.: University of 
Buffalo, 1936. 








195 








sex, g 
ON 81 &% 8 16 
ON 13 It 89 8¥ TL 
w 89K s S iy ZI 26 2s IF sg rs 
= ok : or 0 : -— = a = & rH 18 
es] ON L. 61 » 8 - =. . = 61 4 LLY 
~ ON 81 an 8z 38 r Lg 
= N It 82 $ 2 SL Lb cg 89 8 28 $I 10 62° ‘ 
7 oh & 9 3 16 st 09 63g 68 e368 goz 2% a 
~~ 80 LI g ay L 89 19 OL — = L°0 ‘ al 
ZI 801 Lg ty ze 881 4 I 10° 98°— 4 
he) wk ; £ ;. a ae 19 zy ay 09 09 ‘SOFT t oe 0g 
I 68°Z . 2 a + a 
~ oN 6 i: i. ae or 89 ” gS 69 -— ss + ‘tT 98°+ az 6z 
a or 6 2 OY ~ a ae. a oY H 2L 8h 88 = BI" 8'I | (¥" a OU 8% 
z v Lal ZP 1g £4 L 1hZ 8Z°Z I 09° 2+ WwW 
oS) ON 91 z L 0 Ill Ly 69 09 92 $ 6 49 ; r8°0 5, oo az Lz 
FP 28 29 LZ 0s I 6Z gr'z x ¥6°0 : 5 Ww 
= sex L ol cr 8 > g¢ 08 62 Il IFI Ize £0°T sI° 09°— I Ww 9% 
sex I 9T Il L 88 ¥ 9 86 og ee gh ys Zs 20 Ze: ort a. ZI°— os WwW SZ 
© ON 8 LI , ¥ PR 1g ¥S 19 28 18 82 $6 FEZ Z0 os LIT ¢ ‘T Z1°+ 0% W vz 
A sex II L 8 It 8S 6¥ 19 LY OF 88 661 £8 as 221 L'0 zee+ az Ww &% 
& ae hlUnté«éS a, 2 2 & 1g 18 28 02 16 08 +e ae Fe vs pe 
ve Ad cP ¥9Z 63°Z a 08° T Sea Ww 
~Dy 89x OL It £ 8 £6 sv 19 Ms es z vP vr 98 ° ZL x ro — vse 1% 
£ o6 I 90 ¥8°Z : 92° ; Ww 
= Bex | LI I z SR LE 0g o oe IF Zz 892 98° 91% OF" 9h + or a 02 
~ - a Ae ; sles = * s 2 us Ort OL bod vz 61 
; 98 +4 91 82 69°Z ; 10°T ; Vv qd 
2) eo, 93 61 g 0 SL 63 98 ¥ Le ¥ 8% 691 88° ¥2°T . 60°+ Ls WwW 8! 
L gI ad b 92 e 8L ze 3 «OF 92°T . az LI 
x oo I § sr sss) ¥Z v9 261 69°2 . 2 a. = 
I LI g L9 rg 9 29 Ze y 80°T T u— ag 91 
S ek or F 8 s © & mL 19 8% ; 6¥ roe LY $s (FO SOE : Ss OM gt 
3 ext ¢§ 2. nwa: s 69 99 Le ys vy +4 | 68°Z ‘0 90°T + 86 of pa 
A I g LL 68 g r LY 991 T 9° Olt a 09°— z Ww vr 
po pe 8 SB «gs 9 oF € OF 96 ov g 19° on nn} Oz a 
Q eA ot L 9I 6 TOT oF zo gf ¢¢ LSZ ‘ 16° ‘tT oT a 4 
N 81 2 9 git aL 4 z 8% 991 6 8° 1¥0 : oz + O8 i ra 
Ry 6 02 4 gg 86 SL € &L oe : = 68°0 . al ca 
< or 96 soo 89 v1 pI 62° ; oa 8y'— A 
K re) » 6 . & & & vL 29 ov oF ; Z'T 9t so°— oe or 
os 39 44 e81 19°% ‘tT $8rT . a 
zs) e8 8 8 m om 9 Fo 2 88 H 18 I Ooze ¥% aI" 0g"T 4 1— oI W 6 
~ 3 ot. ry e S 4 FE an 8 SP € ee 16 LL 18 ‘% SI't ort 02° + at 8 
3 eF g & s 82 wm oy 08 oP 2 6 re LL'O ; as L 
3 2 cE &f Fe gs fe gs | GF 7. 2 wt Foe et fT Wt ag i 3 
? a5 Sp eh rc OL we et oe a es az W § 
a ka az a™ 3 + 5 3 ys 8s BST ; vz ¥ 
a ojUeAU] JuewzSsnt g< ® 8 = C > ox z 8o'T $0°— az Ww 4 
= PY (ra ef : = <8 ot Q O 06° + Ww 
S & ey my -_ + 9 . 4 ; 
989], JueWeASTYO “ Qa > . Ue Be & ¥ - 
1yPV looyss 43 > > z 
y3ry Asiey-souog A so 5 
Lad & 
§ VLVC UaLSVW AIaWvS 4 
IIAXX Fav 
XIGNdddy 


March, 1939] 








[Vol. 7, No. 3 


JOURNAL OF EXPERIMENTAL EDUCATION 


196 





98 9 re oF 22 09 1g % 
0 


0 I 0 


Zhi oy 
et 





98 9% 1g °% 61% Tg °% 


Set oF 
e¢ 
Sel % 
£8 
Zvi 
vP 


20° FS 








0 I 0 0 
93 £E 09 22 9% 62% 
0 I 0 0 
143 FE 93 81% SZ 
0 1 0 0 
e¢°9 19°L 62°) ¢9°¢ 
SZ°L Or'L Le’ 91°9 
06°S SPL 1e'P rs's 
1s'9 w6°L Zi It’s 
lott 1g PI 6I°L Lg°L 
63 OL £0° FT LL’9 98°L 
vr it 80°ST 68 L SPL 
L801 22° Ft seb 00°8 
LLe LL LLE LLE 
Lil Lit Lit Lit 
1st Ist Ist 1st 
601 60T1 601 601 
& xy = <x 
® r) 
i ae ta 
z si8 > % 
s 
e& 





Ai0jUeA UT QuewySN [py [Pq 


LOLI 
02°61 
90°12 


hes) 


389,), ystsu 


uvAyY-330118 


66 94 


9% 


bP st 
66° FI 
8¢° SI 
99°ST 


vi-Lg 








29°89 
09°99 
0L'99 














789,] JUDWIAAZIYOY JOoyIg YsipT Asez-seu0g 


89°F S29 0Z103 9ZE 00°F 0 $8°39 S6°19 SP1O9 | 
4 z v1 +9 get LE'O— 980 82'I— (wI0.L 
£993 1993 91103 6829 18°89} 69°10 08°10 SI'0—% : 
9 L 9% 98 Tr L8°O— 89°0 9B I— ~"~-"""""" 9 
8999 «82293 8=6LIT 9 «=—OzE 98°89 81°39} 98199 S8°009 “t 
. rd v1 +9 9et 19°0 980 ILl'0— ~ 1 
9 99 gL o% 021% $28 9% 00°F €8°2°3 S6°19 SPI o% oil 
ST S £8 26 rt 666lUC<Cr ROC Vv 
aturcy 
92° IT SO°RT SL°0% 89°6F ss°o) 9'0sCBOCiHHTO 90 
88°01 29ST 096 «88°F oe 698e lOO CUO“ =°*""""9 
06 Ti SZ'8l 96°02 92°29 a ee. ee  -* ui 
*9°OI 98ST LOIS ¥8'6P ws FS Ge We Vv 
uO BAGq PsepuBIS 
89°SE 991 68°S9 16°08T “se wt. ee. ee. ee a LN Fi 
sors 91°28 LSE'S9 09° O8I ZL°% 9L°0 Li'l eS fs __o 
vO'9E = LZ°08 FI'K9 FO'9LT a ae: ae a Hes d 
18°98 LP'SE 12°89 FILET as lUtlCUS CU Oe Vv 
uve, 
698 L9B £98 £98 z8e z88 288 A ie a “--"pe]Q0L, 
Mt ort 601 601 LIL LI Lu Pe cae sate: 9 
OST ost 6hI 6¥1 PST ¥SI ¥S1 ysl conan ena 
801 Lol sot Sol 1 It Il Ss Vv 
808u,) JO JequINN 
Lz, z co 43 Om o ra is} 
5 5 a o oF ‘ ¢ g 
gf > R Es wy ‘ a] ; 
2 ; e > > > 


GaS () SAMASVAW TTY NO SALVIAS(] GAILVDGN ONY ‘SALVIASQ]-NON ‘SSLVIAS(] GAILISOG ‘IVLO], 40 SADNVY GNV SNOILVIASG GUVANVLS ‘SNVaW 
IIAXX avy 


El 





61% TZI 
0 

91% 

0 

LI% 

I 

cI % 

0 


° 
8 


T 


SERSISSE 


HKaonoN 
A 
— 


BiSA= 20 
8 3 
$8838 8 


S8SS3srb3 


Ott 
iio 
ocoooo 


S225 
rrr 


18° 


98 
ov 
0& 
st 


FF 
*> 


3S 


jeuonowmg 








aL Yystsugq 
uvdyy-3}0L1eg 


A107UsAUT JuewyeN(py Tq 789], JUBMIAASIYDY JOOYIS YysIP] AssBp{-seu0g 


w 
= 
Ry 
= 
R) 
= 
R) 
~ 
Se) 
1S) 
< 
© 
~ 
NN 
wy 
>" 
~~ 
So 
m 
OS 
A 
Q 
R) 
NN 
O 
— 
Q 
) 
Re 
QA 
= 
» 


aaSfQ SauNSVaW ‘TTY NO Sanouy-ang TTY 40 SADNVY ANY ‘SNOILVIAS( GUVaNVLS ‘SNVaW 
XIXX FAV 





March, 1939] 











[Vol. 7, No. ; 


JOURNAL OF EXPERIMENTAL EDUCATION 


198 


Z28°0 02°0 £0°T $80 
Zo 1r'0 90°T rrZ 
9st 820 Ze°t SL2 
vrel s1'0 aro 1L°0 
80°0 9L°0 $9°0 62°2% 
e8°0 ve'0 vPro $2°0 
Lut 8P'0 6L°0 ¥6°0 

i] yn 

Pp & -F & 

S e E 3 

= = 

3 

2 





A107 U8AUT Juewysn(py [9g 


“10128 PIVPUBIE 8}1 07 BOUBIOBIP 04} JO 01981 OY} JO BUIIE} Ul pesseidxa OB sOUBdIEZIP 943 JO SAIZITIGUIOY, 





142°0 00°0 £6°0 09'T 6I'°T sto $6°0 to't 9L°T Ts°0 9L°6T 00°0 9° ss ) Vv 
gst 86°0 28° 0 $92 66°0 81t°0 20°T 8L°0 ts°0 et Zo'T os'st 18°23 ZL st o8 ve 
28°F es°2 19°% £0°S oss 98°% 92°% 69°S 12° 98°L so°St 9F'6 1312 o8'T o8 ol 
19°9 6P's L9°@ s9°s 89°2 LL‘% 0° ve'9 L8°s zo°st LY PT BB's v9°02 48 90°12 ve ot 
os*9 00°S T8°¢ 62°F 62°9 Ths v's OV'L 28°8 26°8 SL'st S8t'8e B9°'I% BELT o8 vi 
2°38 40°9 20'S SLL 8P's 86° 0°9 sis £2°6 Sryt StLt Laer wit to ve vi 
80°% 91% vL't $0°1 82°% 96°0 66°2 LL‘0 18°2 o'T SL‘z% eyst 61% SL°6I ot vi 

dnoity puv dnois 
usemjeg 
7 ro) n wn 
3 e A ea bE g 5 = EB - ri On 9 ef g 
F2 & § a a a ns 
e3 FE 3 4 = Fo OP - i? 
ax 
as 
s 


389], JUBWIBABIYOY JooysS YsIPy Asi¥y-seu0g 


+0a8f) SAUNSVaW TIV NO SdhOuty-ANg NIAMLagG SAONAMaddIG AHL 40 ALIMIAVITAY 
XXX F4VL 








THE VALUE OF CERTAIN FACTORS FOR DIRECT AND 
DIFFERENTIAL PREDICTION OF ACADEMIC SUCCESS* 


CLAUDE L. NEMZEK 
University of Detroit 


Intelligence tests are being used extensively 
for purposes of prediction; however, studies 
carried on by numerous investigators under a 
wide variety of conditions have adequately 
demonstrated that the functions measured by 
the types of intelligence tests now available 
show only a moderate degree of relationship 
to academic achievement. Reviews of these 
studies indicate that, among typical groups 
of high school students, the Pearson product- 
moment coefficient of correlation between in- 
telligence test results and academic achieve- 
ment is approximately .50 (Eurich and Car- 
roll, Lee,? Pintner,? Segel,* Turney’). This 
degree of relationship suggests that non- 
intellectual factors play a large part in con- 
ditioning scholastic success. In order to be 
able to predict academic success more ade- 
quately than is possible by using intelligence 
tests alone, one must segregate those non- 
intellectual factors which, independent of the 
functions measured by our available intelli- 
gence tests, bear a significant relationship to 
measures of academic performance. 

The present study represents an attempt to 
determine the value of a number of non- 
intellectual factors for predicting school suc- 
cess as measured by teachers’ marks. 

The purpose of this study is to present an 
analysis of data in order to reveal any pos- 
sible values that chronological age at entrance 
to elementary school, amount of education of 
father, amount of education of mother, and 
occupational status of father may have for 
direct and differential prediction of academic 
success as measured by teachers’ marks. 


* From a thesis submitted to the Graduate Faculty of the 
University of a in partial fulfillment of the require- 
ments for the degree of Doctor of Philosophy. 

Alvin C. Eurich and Herbert A. Carroll, Educational 
Paeina. Boston: D. C. Heath and Co., 1935. Pp. 436. 

?J. Murray Lee, A Guide to Measurement in Secondary 
Schools. New York: D. Appleton Century Co., Inc., 1936. 


a 514. 

* Rudolph Pintner, Intelligence Testing. New York: Henry 
Holt and Co., 1931. Pp. 555. 

*David Segel, Prediction of _~ in College. Office of 
Education Bulletin 1934, No. 15. Washington: United States 
Government Printing Office, 1934. Pp 98. 

* Austin H. Turney, Factors Other Than Intelligence That 
Affect Success in High eo Minneapolis: University of 
Minnesota Press, 1930. Pp. 


Data were available in the records at Uni- 
versity High School, University of Minne- 
sota, for 196 boys and 156 girls. All of these 
cases had been graduated from University 
High School. The following ten variables 
were tabulated for each of the 352 cases: 


(1) Intelligence quotient 

(2) Chronological age in months at en- 
trance to elementary school 

(3) Amount of education of father in 
years 

(4) Amount of education of mother in 
years 

(5) Occupational status of father on the 

Minnesota Scale 
(6) Honor point average in mathematics 
(7) Honor point average in science 
(8) Honor point average in English 
(9) Honor point average in history and 
social science, and 
(10) Honor point average in languages. 

Only those cases were used wherein the 
results of at least two years of course-work in 
each of the five subject-matter fields were 
available. 

The measure of intelligence used in this 
study was based upon the results of five 
group intelligence tests. The tests employed 
were Army Alpha 8, Pressey Senior Classifi- 
cation, Haggerty Delta 2, Terman Group 
Test, Form A, and Miller’s Mental Ability 
Test, Form A. Intelligence quotients were 
computed from the results of each test for 
each individual. The authors’ manuals for 
the respective tests were followed as closely 
as possible in administering and scoring, and, 
for ali cases except that of the Pressey Test, 
in computing the intelligence quotients. In 
this instance, where the author’s norms proved 
inadequate for children who made unusually 
high scores, the difficulty was resolved by 
extrapolation. The intelligence quotients 
were in all instances converted into Stanford— 
Binet equivalents by means of the method 


199 








200 JOURNAL OF EXPERIMENTAL EDUCATION 


proposed by Miller.” Of the five intelligence 
quotients, the middle value was chosen as the 
measure to be used for each individual. 


Marks at University High School are given 
in the form of letter ratings. For the present 
study the letter ratings were converted into 
honor point averages. Each quarter hour 
mark of A was given three honor points; each 
quarter hour mark of B, two honor points; 
each quarter hour mark of C, one honor 
point; each quarter hour mark of D, no 
honor points; and each quarter hour mark of 
F, minus one honor point. Then the total 
number of honor points in each of the five 
subject-matter fields was divided by the total 
number of quarter hours of marks involved in 
the respective subject-matter fields in order 
to obtain the five honor point averages. 

In Table I are presented the Pearson 
product-moment coefficients of correlation 
obtained by computing separately for boys 
and girls all of the intercorrelations for the 
ten variables which were available, together 
with the means and standard deviations of 
the ten variables. 

Of the forty coefficients of correlation 
showing the extent to which chronological age 
at entrance to elementary school, amount of 
education of father, amount of education of 
mother, and occupational status of father are 
related to honor point averages in mathe- 
matics, science, English, history and social 
science, and languages, not one is statistically 
significant, in the sense of exceeding four 
times its probable error. We may therefore 
conclude that the value of chronological age 
at entrance to elementary school, amount of 
education of father, amount of education of 
mother, and occupational status of father, for 
the direct prediction of academic success as 
measured by honor point averages derived 
from teachers’ marks, is negligible. 

Despite the fact that certain variables may 
be of little value for purposes of direct pre- 
diction, they may be more valuable as prog- 
nostic of differential ability. This is primarily 
due to the fact that in finding a differential 
correlation coefficient a negative and a posi- 
tive direct coefficient may be brought together 
and the effect is an additive one. A survey of 
Table I reveals that of the 40 coefficients of 
correlation showing the relation of chrono- 
logical age at entrance to elementary school, 


*W. S. Miller, “The Variation and Significance of Intelli- 
gence Quotients Obtained from Group Tests,” Journal of 
Educational Psychology, 15: 359-366, 1924. 





[Vol. 7, No. 3 


amount of education of father, amount of 
education of mother, and occupational status 
of father to the five honor point averages, 
twenty-two are positive and eighteen are neg- 
ative; furthermore, there are only six oppor- 
tunities for negative and positive direct co- 
efficients to have an additive effect to produce 
higher differential coefficients. 

Despite the fact that the direct coefficients 
are so low that they almost preclude any sig- 
nificant differential coefficients, the value of 
chronological age at entrance to elementary 
school, amount of education of father, amount 
of education of mother, and occupational 
status of father for purposes of differential 
prediction was determined by Segel’s’ method. 
In Table II are included the differential pre- 
diction coefficients* based upon the data avail- 
able for the 196 boys and the 156 girls under 
consideration. 

A study of Table II reveals that chrono- 
logical age at entrance to elementary school, 
amount of education of father, amount of 
education of mother, and occupational status 
of father have no significant value, practic- 
ally or statistically, for purposes of differen- 
tial prediction of the abilities measured by 
honor point averages in mathematics, science, 
English, history and social science, and lan- 
guages. Not one of the 80 differential predic- 
tion coefficients is as much as four times its 
probable error. 

Undoubtedly the mental functions meas- 
ured by honor point averages in mathematics, 
science, English, history and social science, 
and languages have a high degree of commu- 
nity of function. That this is probably true 
is indicated by the data in Table I. In the 
case of the boys, the intercorrelations obtained 


™ David Segel, Differential Diagnosis of Ability in School 
Children. Baltimore: Warwick and York, Inc., gt 86. 

David 1, Prediction of Success in College ce 
Education Bulletin te ye. LY Washington, D. C.: Govern- 
ment Print Office, 98. 

David Segel, “The Construction and Interpretation of Dif- 
ferential Ability ——, Journal of Experimental Educa- 
tion, 2: _— 

David deppificrential Prediction of Ability as Repre- 
sented by flege Sub: Groups,” Journal of Educational 
Research, 25: 14—26, January, 1932; 25: 93-98, February, 
1932. 


David Segel, ‘Differential Prediction of State Success,” 
School and Soclety,, 39: 91-96, January 20, 

David Segel and R. Gerberich, Ditlerential College 
Achievement by the American Council Psychological 
Examination,” Journal of Applied Psychology, 17: 637-645, 


December, 1933. 
Lee and David Segel, “The Utilization of Data 


.M 
pe Sim or Direct Prediction in the Deve nt of 
Equations for Differential Prediction,” Journal of 
Educational Psychology, 24: 550-554, October, 1933. 
* The formula used is: 


Tax Tn — Tox Td 








r 
(a. bd)« 
Ver+ Op? — 2ran Fa Fr 


at 





March, 1939] 


from these five honor point averages range 
from .611 to .766; in the case of the girls, 
from .710 to .825. 

The data demonstrate that the intelligence 
quotient has considerable value for direct 
prediction. In Table I, one may note that the 
relationships of the intelligence quotient to 


PREDICTION OF ACADEMIC SUCCESS 


201 


the five honor point averages range from .401 
to .502 for the boys; from .495 to .606 for 
the girls. These data corroborate the findings 
of other investigators. 

The ineffectiveness of the intelligence quo- 
tient for differential prediction is clearly por- 
trayed in Table II. Differential coefficients in- 





TABLE I 


MEANS, STANDARD DEVIATIONS, AND INTERCORRELATIONS WITH PROBABLE ERRORS, OF TEN 
VARIABLES,* FOR 196 Boys AND 156 GIRLS** 


Variables Boys 1 2 3 4 5 6 7 8 9 10 
Girls Means 117.00 71.16 13.34 11.85 1.91 1.010 1.255 .980 1.145 .895 
S D's 11.95 6.57 3.73 2.66 .94 . 705 745 .605 .620 . 785 
—.187 -205 .179 —.170 .456 .499 .502 .437 .401 
1 117.40 12.60 .046 .046 .047 .047 . 038 . 036 . 0386 .039 .040 
—.194 .093 .028 —.072 —.063 —.040 —.052 —.078 -—.154 
2 71.22 7.53 .052 .048 .048 .048 .048 .048 .048 .048 .047 
.O77 .012 -5622 —.665 .134 .170 .150 . 082 .078 
3 13.87 3.68 . 054 .054 .035 .027 .047 .147 .047 .048 .048 
.140 .054 .451 —.329 114 .070 .163 .097 .099 
4 12.13 2.64 .053 .054 .043 .043 .048 .048 .047 .048 .048 
—.105 —.058 —.692 —.309 —.128 —.181 —.143 —.094 —.139 
5 1.91 1.11 .053 .054 .028 .049 .047 .047 .047 .048 .047 
.542 O11 .147 -162 —.107 754 .642 .657 . 662 
6 1.045 .745 .038 .054 .053 .053 053 .021 .028 .027 .027 
.606 —.052 .131 112 —.115 .800 .621 .879 .611 
7 1.100 . 760 .034 .054 .053 .053 .053 .019 .030 .026 .030 
.564 .003 .075 .101 —.116 . 753 . 764 . 766 .691 
8 1.335 . 705 .037 .054 .054 .053 .053 . 023 .022 .020 .025 
.506 —.012 .057 -055 —.128 .741 .778 .825 -683 
9 1.260 .740 .040 .054 .054 .054 .053 .024 .021 .017 .026 

.495 —.030 .157 .136 —.206 .710 . 758 . 806 . 787 

10 1.230 .865 .041 .054 .053 .053 .052 .027 .023 .019 .021 


* Variables: (1) IQ, (2) CA in months at entrance to elementary school, (3) education of father in years, (4) education of mother 
in years, (5) occupational status of father, (6) HPA in mathematics, (7) HPA in science, (8) HPA in English, (9) HPA in history 
and social science, (10) HPA in languages. 

** Means and § D’s of boys at top; means and S D’s of girls at left; intercorrelations for boys above and to right of major 
diagonal; intercorrelations for girls below and to left of major diagonal. The upper figure of each pair is the Pearson product-mo- 
ment correlation coefficient; the lower its probable error. 


TABLE II 
DIFFERENTIAL PREDICTION COEFFICIENTS* FOR THE TEN VARIABLES** 
Variables 1 2 3 4 5 

6— 7 —.103 —. 030 —. 062 . 056 . 088 
—.119 . 031 . 020 . 024 .014 

6— 8 . 035 —.024 .010 —. 035 —.010 
.014 . 010 .111 . 097 . 004 

6— 9 . 091 .010 .079 . 036 —. 010 
. 056 . 032 .126 . 146 . 028 

6—10 .010 .012 . 054 . 00002 . 030 
—. 040 . 055 —. 042 . 005 . 158 

7— 8 -113 . 0001 . 060 —.077 —. 081 
.124 —. 082 . 092 . 028 —.010 

7—9 . 180 . 033 . 136 —.017 —. 137 
.172 —. 062 . 155 . 089 .010 

7—10 . 084 .013 . 097 —. 037 —. 039 
. 057 —. 022 —. 063 —. 057 . 158 

8— 9 .078 . 040 . 095 . 092 —. 068 
. 055 .010 . 026 . 071 .010 

8—10 —.020 . 016 . 052 . 036 . 039 
—.059 . 055 —. 162 —.091 . 188 

9—10 —.075 .013 —.014 —. 030 .174 
—.100 . 032 —.174 —.143 . 156 

* The upper figure of each pair is for the 196 boys; the lower for the 156 girls. 


A in months at entrance to elementary school, (3) education of father in 
ars, (5) occupational status of father, (5) HPA in mathematics, (7) 
(9) HPA in history and social science, and (10) HPA in languages. 


** Variables: (1) IQ, (2) 
years, (4) education of mother in 
HPA in science, (8) HPA in Englis 








202 JOURNAL OF EXPERIMENTAL EDUCATION 


volving the intelligence quotient range from 
—,103 to .180 for the boys; from —.119 to 
.172 for the girls. 


SUMMARY 


The purpose of this study was to determine 
the value of the intelligence quotient, chrono- 
logical age at entrance to elementary school, 
amount of education of father, amount of 
education of mother, and occupational status 
of father for purposes of direct and differen- 
tial prediction of academic success as meas- 
ured by honor point averages based upon 
teachers’ marks. Data were available for 196 
boys and 156 girls, all of whom had been 





[Vol. 7s No. 3 


graduated from University High School, Uni- 
versity of Minnesota, and all of whom had 
had at least two years of course-work in each 
of the subject-matter fields for which honor 
point averages were computed. 

The data demonstrate that chronological] 
age at entrance to elementary school, amount 
of education of father, amount of education 
of mother, and occupational status of father 
have negligible value for purposes of direct 
and differential prediction of academic success 
as measured by honor point averages derived 
from teachers’ marks; and that the intelli- 
gence quotient has value for direct prediction 
but not for differential prediction. 











A TECHNIQUEFFOR THE MEASUREMENT OF 
SOCIAL ADJUSTMENT* 


J. E. JANNEY 
Western College, Oxford, Ohio 


PURPOSE 


All too often it is assumed in faculty circles 
that ¢he important criterion of a student’s 
success in college is his academic achievement, 
—and of course academic achievement is ex- 
pressed primarily in terms of grades. Attempts 
to develop instruments for prognosis of col- 
lege success use academic record in this nar- 
row sense as the major criterion; thus intelli- 
gence tests are commonly validated by corre- 
lating these tests with point-hour ratio or 
other similar data. However, it is being rec- 
ognized increasingly that the college should 
aim to accomplish more than the development 
of academic competence. The development of 
social competence (using this phrase in a 
broad sense) should also be a major objec- 
tive; students should learn at college how to 
get along with their fellows and should de- 
velop certain capacities for leadership. Fur- 
ther, adjustment to the other sex is (at least 
in a woman’s college) a problem to which pre- 
sumably the college should give some 
thought, since such adjustment is almost cer- 
tain to be a major factor in the future life 
and happiness of these young women. 

The purpose of this research was to make 
a systematic study of college guidance prob- 
lems, taking special account of the last two 
objectives—the development of social adjust- 
ment to the same sex and to the other sex. 
It is believed that the investigation is distinc- 
tive in two respects. In the first place, rela- 
tively objective indices of social ability in re- 
lation to the same sex and in relation to the 
other sex were developed. To develop these 
indices, and to use them along with the con- 
ventional index of grades in a study of col- 
lege prognosis and guidance problems, is be- 
lieved in itself important. In the second p’ace, 
all three criteria of success have been related 
to an unusual variety of other data, including 
not only intelligence test scores, but also re- 


* From a thesis ted in partial fulfillment of the re- 
quirements for the Ph. D. in Psychology at Ohio State Uni- 
versity, 1935. The writer wishes to express his ap iation 
for the advice and counsel of Dr. S. L. Pressey, under whose 
direction this investigation was made. 


sults of tests of interests and attitudes and 
ratings on a number of traits by both stu- 
dents and faculty. The total results would 
thus seem to be of exceptional range, and to 
offer rich opportunities for comparisons of 
possible significance. 


PROCEDURE 


The study was made in a small liberal arts 
college for women located near a _ co- 
educational state university in a typical col- 
lege town in the Ohio valley. The data used 
are from the upper three classes of the col- 
lege. The institution gave exceptional oppor- 
tunities for such a study. Since the college 
student body was small, it was possible to 
find leading students and faculty members 
who knew everyone in the student group. 
Every one of this student group was known 
by the writer. All the students live in dormi- 
tories. The total situation in the college and 
in the community is well known to the writer. 
Opportunities for studying the total life of 
these young women and their total develop- 
ment were thus exceptional. 

The three criteria for the three types of 
college success above mentioned were as fol- 
lows: (a) academic success was indicated by 
point-hour ratio, or the number of credit 
points divided by the number of semester 
hours; (b) success in relation to other women 
students was indicated by an index of success 
made out from each student’s campus record. 
Thus each office and honor on the campus 
was given a rating, such as class president 40, 
athletic letter 25, etc., and a great variety of 
such items considered, including memberships 
on committees and other like minor recogni- 
tions, so that almost all students had a record 
of some sort. (c) Success in relations with the 
other sex was indicated by number of evening 
dates for the nine month school year as ob- 
tained from the dormitory “sign-out” book in 
which each young woman is required to record 
the name of each male caller. Careful inquiry 
indicated that these records were reasonably 
accurate and might be considered a real cri- 


23 








204 


terion of the extent to which a girl had made 
mutually satisfactory adjustments to the 
other sex. 

The “dependent variables” or further 
group of measurements above mentioned were 
as follows: (a) intelligence test scores 
(O.S.U.), (b) the Pressey Interest-Attitude 
Test, (c) the Thurstone Attitude Scale on 
Communism, and, (d) ratings on the follow- 
ing personality traits, (i) cooperativeness, 
(ii) sagacity, (iii) home background, (iv) 
emotional maturity, and (v) sophistication. 
The mid-rating of each of two groups of 
raters was selected, the two groups being 
(1) three members of the college faculty and 
(2) three officials of the student government 
association. The writer would emphasize the 
variety of the data gathered and emphasize 
especially the variety of relatively objective 
criterion variables. 


RESULTS 


The inter-relations of the criterion vari- 
ables with dependent variables were studied 
by means of the Toops correlation formula 
for the Hollerith machines. 

In analyzing the inter-correlations of vari- 
ables, No. 1, i.e. number of dates with the 
other variables, the following observations are 
suggested. Since 17 of the 19 inter-correlations 


JOURNAL OF EXPERIMENTAL EDUCATION 





[Vol. 7> No. 3 


are less than .20, and for all practical pur- 
poses approach o, it would appear that the 
number of dates had by a girl over a nine 
month interval may be taken as an independ- 
ent measure of sociality. The two correlations 
which are more than .30 are on the rating of 
sophistication, .39 for faculty rating and .;; 
for the student rating. It is possible that this 
may be tautological, i.e., girls who have dates 
are considered sophisticated — sophisticated 
girls are those who have dates. 

The inter-correlations of campus activities 
(variable No. 2) with the other variables 
shows a positive correlation of .49 with point- 
hour ratio, .31 with intelligence, and would 
seem to indicate that there is a tendency for 
those qualities or abilities which make for 
academic success to be similar to those qual- 
ities or abilities which make for social success 
with members of one’s own sex, aS measured 
by extra-curricular achievement. Interest- 
Attitude Test No. 4 (admirations) shows a 
negative correlation of .20. Since the scores 
on this test are immaturity scores, it is sug- 
gested that there is somewhat of a tendency 
for maturity of interest to be positively cor- 
related with extra-curricular participation. It 
is possible that relatively high correlations be- 
tween campus activities and personality rat- 
ings present another tautology, i.e., the raters 


TABLE I 
SHOWING CORRELATIONS OF CRITERION VARIABLES WITH ALL OTHER VARIABLES 


Number of dates_______ 


ee eee ee 
eae eee 
RESET SLES Ss Tae RN 


PRESSEY INTEREST-ATTITUDE TEST 


OE EO Te 
ges, RC a ee es 
eee - “ESS SE 
8. Admirations..__...__-. on ee he. won baton 
EE ey ee ee ee eee 


10. Thurstone Att. Scale toward Communism--.-.- 


FACULTY RATINGS 

11. Co-operation..........- 
i EE ; 
18. Home background__-_-_- pak ib 
14. Emotional maturity__._._____- 


ee Ae aera eRe rete s 


STUDENT RATINGS 


EE EER EarS See eoepe era mae 
Te TN eecucdcusunume 


19. Emotional maturity_-_-_---_- 


Tic Sande’ JnGteed ckutwelsies'ecunna 


Number of Campus Point-hour 
Dates Activities Ratio 
—. 04 —. 03 
aN —.04 .49 
bs. —.03 .49 
wed . 03 . 32 . 56 
oe 03 —.02 —.03 
Sk 04 . 07 . 04 
waa —.17 . 05 —. 07 
- —.03 —.20 —.15 
con —.09 —.11 —.13 
_— 03 22 28 
sbi —.12 .61 57 
iil 02 . 62 81 
sa 08 .42 43 
a —. 03 . 45 57 
: 39 —.14 —.11 
Say —-. 08 . 66 52 
. 01 . 57 52 
tikes 06 . 54 56 
ed —. 02 45 41 
35 —.25 —.27 








March, 1939] 


were unconsciously influenced by a halo effect 
and therefore rated favorably those students 
who were noted in campus affairs. The Thur- 
stone Attitude Scale on Communism is taken 
as a measure of open-mindedness on a highly 
controversial issue. A high score on this atti- 
tude scale means that a given student is will- 
ing to view the question on Communism ob- 
jectively and impersonally. The positive cor- 
relation of .22 indicates that insofar as this 
“burning” issue is concerned those students 
who participate extensively in extra-curricular 
activities are relatively more open-minded 
than those who do not participate. 

Inter-correlation of point-hour ratio (vari- 
able No. 3) with the variables of number of 
dates, campus activities, and Thurstone Atti- 
tude Scale on Communism, have already been 
discussed. The usual positive correlation of 
.56 between point-hour ratio and intelligence 
was found. The usual faculty attitude toward 
point-hour ratios as the chief end and aim of 
life is indicated by the positive correlation of 
81 between faculty rating on sagacity and 
point-hour ratio. 

Since the correlations from the Pressey 
Interest-Attitude Test, both sub-scores and 
total scores, were uniformly low, both nega- 
tively and positively, an item-analysis was 
made. 

The 160 subjects were separated into three 
groups on the basis of three criterion vari- 
ables, 50 high on each variable and 50 low 
on each variable, leaving an undistributed 
middle of 60. Those items which yielded a 
difference of 20 per cent or more between the 
upper and lower groups were selected for 
study. The 20 per cent difference produced a 
critical ratio of 2.8, but since there were a 
large number of differences all in the same 
direction, it was thought that this critical 
ratio was large enough to be reliable. 

The results were about what one would ex- 
pect, superior students were much more inter- 
ested in “science,” “reading,” “studying,” 
“education,” etc. The students who had a 
large number of dates were much more inter- 
ested in “clothes,” “fashion,” “personal ap- 
pearance,” “men,” “children,” and worried 
much more about examinations. 

Those students who were active in extra- 
curricular activities showed a preference for 
such items as “baseball,” etc., which were 
concerned with athletics and other campus 
affairs. 





MEASUREMENT OF SOCIAL ADJUSTMENT 205 


CONCLUSION 


We may conclude from this study that 
(1) extra-curricular activities and scholarship 
are positively related, (2) insofar as young 
women are concerned, dates are an independ- 
ent and unique measure of sociality, (3) open- 
mindedness as measured by the Thurstone 
Attitude Scale on Communism seems to be 
slightly correlated with scholarship, intelli- 
gence, and extra-curricular activities, (4) the 
three measures of achievement used in this 
study, point-hour ratio, extra-curricular ac- 
tivities, and dates, seem to carry with them 
their own individual constellation of interests 
and attitudes. The Item-analysis of the 
Pressey Interest-Attitude Test shows that 
students with a high point-hour ratio show 
a preponderance of intellectual interests with 
little worry over examinations. Those who 
have dates show a preponderance of those 
interests which pertained to personal appear- 
ance of interests in athletics and related 
curricular participation shows a preponder- 
ance of interests in athletics and related 
activities. A Factor Analysis (Thurstone 
Method) of the inter-correlations of both 
faculty and student ratings on the five per- 
sonality traits showed a double halo effect, 
i. e. there was a marked tendency for both 
groups of judges to rate high those students 
whose activities were largely on the home 
campus on all traits except sophistication and 
conversely to rate low on all traits except 
sophistication those students whose claim to 
distinction lay chiefly in their social success 
with the male sex. 


EDUCATIONAL IMPLICATION 


It is hoped that tnis study will aid in 
dispelling a part of the fog of unsupported 
opinion in regard to the supposed antipathy 
of intelligence and scholarship on the one 
hand and extra-curricular activities and dates 
on the other hand. It appears that young 
women may participate in a wide variety of 
extra-curricular activities and at the same 
time achieve academic distinction. Since the 
correlations between dates on the one hand 
and intelligence and point-hour ratio on the 
other approach O, the writer suggests that 
the studious, intelligent, young woman may 
well offer effective competition to her more 
social and less academic sisters in the field 
of heterosexual social endeavor. 








a 





THE McCAULEY TETRAHEDRON TEST 


Leste D. Hayes AND CHARLES A. DRAKE 
West Virginia University 


This discussion of the McCauley Test is 
presented primarily because of the bearing 
of an analysis of the test upon the design 
of other tests. From this analysis certain 
principles are deducible which should be of 
value to all who undertake the construction 
of similar tests. The results indicate that it 
does not follow that a test should be sum- 
marily discarded because it does not meet 
the usual criteria of reliability and validity. 
Often the attempt to measure the test against 
the criteria will reveal difficulties in the way 
of obtaining higher values and indicate the 
modifications that should be made in design, 
administration, and interpretation. 


The test, as originally designed by 
Mr. W. J. McCauley’ of the University of 
Arizona, consists of six right tetrahedrons. 
These were cut from a block of wood 
r’x1'4"x2¥4” by sawing through the diag- 
onally opposite sets of edges, making three 
separate cuts with the saw. Each student was 
given a set of the six blocks numbered from 
one to six and a set of 144 drawings showing 
every possible position these blocks could oc- 
cupy relative to the planes of projection, each 
drawing showing three views of a block in 
orthographic projection. The task of the 
examinee was to choose the block represented 
in each drawing, covering as many drawings 
as possible in a given length of time. 


The hypotheses upon which the tests were 
based were that one of the fundamental traits 
of the engineer is ability to visualize in two 
or three dimensions, to perceive relationships 
in both, and to pass from one to the other. 
The motive was to provide a means for the 
objective measurement of this trait or set 
of traits. The hypothesis received support 
from a study made by the Engineering Foun- 
dation in 1930. Following this report several 
committees of the Society for the Promotion 
of Engineering Education undertook studies 
bearing upon the measurement of the traits 
in question. Since 1936 the latter effort has 
centered in a special committee of the Engin- 


_' McCauley, W. J., The Tetrahedron Test of Power to 
Visualize, The Journal of Engineering Education, 23:8:624— 
627, April, 1933. 


eering Drawing Division of this society, be- 
cause of the belief that descriptive geometry 
is the one subject that both develops and 
utilizes these traits. Prof. C. V. Mann, of the 
Missouri School of Mines, has been chair- 
man of this committee coordinating experi- 
mental work in testing the hypotheses. 

The untimely death of Mr. McCauley in 
1935 might have ended further development 
of his test had further experimentation not 
been taken up by Professor Mann’s commit- 
tee. Ten institutions have given the test to 
more than 2,000 students, but few of the re- 
suls have thus far been published. McCauley’s 
report in 1933 is the last and only compre- 
hensive summary of findings. 

Mann? and McCauley found r’s of .70 be- 
tween scores on the test taken during the 
second half hour of testing and grades in 
descriptive geometry awarded on the basis of 
objective tests. They found scores on the 
second half hour gave somewhat better cor- 
relations than scores on the first half hour 
and on the total hour of testing. 

Our results are based on a period, of only 
one-half hour, for 95 students. The correla- 
tion between scores for two administrations 
of the test, in February and again in May, 
is .41, a result that is disappointingly low 
and that would ordinarily justify rejection 
of the test. While McCauley reported an odd- 
even coefficient of reliability of .94, this re- 
sult must be interpreted with care. In general, 
time-limit tests having items of equal diffi- 
culty show misleadingly high odd-even cor- 
relations because of the application and 
measurement of approximately equal speeds 
per item on odds and evens, thus making the 
correlations reflect speed in performance. 
Hence, for speed tests, the only legitimate 
measurement of reliability is that derived 
from two forms of the test applied at different 
times. Mann has found the reliability suffi- 
ciently high to justify continuation of the 
experimentation, although he does not give 
a correlation coefficient. 

Our correlations between the first applica- 
tion of the test and grades in descriptive 

2 Letter from Professor C. V. Mann dated Jan. 17, 1939. 


206 














March, 1939) 


geometry and between the second application 
of the test and the same grades are .25 and 
24, respectively. These results are too low to 
justify the use of the test for sectioning a 
class prior to instruction. They also indicate 
that the trait measured by the test does not 
seem to be affected by the instruction in 
descriptive geometry that was given during 
the interval between tests. Distributions of 
the scores on the two tests are shown in 
Table I. 


TABLE I 


DISTRIBUTIONS OF SCORES 


Scores February May 
OE Oe SN cineca 1 
err 1 
fo eee 1 
/) Se RR boom 3 
OE i TD icici neiinaoe 1 2 
SS 7 ae = 
er 1 4 
ME rns tases comsatiencinte 6 4 
2 eee 5 8 
Ek TE eet 16 16 

CY aS 32 27 
—9 to _ SE Ee 25 22 

cote AI RII ini cdakentiveinendinnhencattiin 8 3 
—29 to —20 ...........-.-. 1 
c_ a aoe 95 5 


Inspection of this table reveals a dispro- 
portionately large group of scores of zero and 
below. Part of this is due to the method of 
scoring the test. This method assumes that 
by guessing alone, an examinee should be 
able to get right answers for one sixth of the 
number of items he tries, since there are only 
six blocks among which to choose. To offset 
the effect of guessing, the final scores were 
computed from the formula: Five times the 
number right minus the number wrong. Of 
course, only the average of the scores made 
by guessing alone will fall at zero while there 
will be a normal distribution of scores around 
this point. This accounts for the range of 
scores below zero but also carries the impli- 
cation that a similar range of scores above 
zero is attributable to the same cause. 


The foregoing consideration suggests that 
higher correlations between the two tests and 
between the tests and grades might be ob- 
tained if this group of scores so strongly 
affected by guessing was removed from the 
calculations. When the 41 cases that seemed 
least affected by guessing were handled as a 
separate group they gave correlations of .37 


McCAULEY TETRAHEDRON TEST 


207 


between the two sets of scores on the tests, 
.24 between the first test and grades, and 
.20 between the second test and grades—tre- 
sults but little different from those found for 
the whole group previously. 

It is always possible to attribute a low 
correlation between a test and grades in a 
course to the well-known fact that the latter 
are themselves usually unreliable. If the cor- 
relations between scores on successive exam- 
inations in a course, as well as between grades 
in the same course for two terms or semesters, 
are usually not better than .75—as is the 
case—then correlations between such scores 
or grades and scores on a test that seems to 
be related to the abilities called for in the 
course will seldom be much higher than that 
figure. Where the grades are even less reliable 
—as is, again, often the case—the correlations 
between grades and test scores will tend to 
be still lower, also. 

While this may explain the low correlations 
found between the grades and the test scores, 
it does not explain the low correlations be- 
tween the scores on the two applications of 
the test. Some other factor must be re- 
sponsible for this latter result. It can not be 
due to intelligence as measured by the A.C.E. 
test, since those who gained on the second 
application of the test (61% of the whole 
group) had an average percentile score of 
43 on the national norms, whereas the 39% 
who had losses or no change had an average 
percentile score of 45. Variabilities in the 
A.C.E. scores in the two groups were similar. 
(See Table IT). 


TABLE II 
DISTRIBUTION OF GAINS AND LOSSES ON RETEST 


Aver. Percentile 





Amount in Score on the 
Score Points Frequency A.C.E. Test 
Tito 9..... 2 
61 to 70_---- 4 
t= 
41 to 50..-.. 1 43 
ft a 7 ; . 
Se t.... 8 
ta = 14 
i Miwa J 
—9 to = 20 
—19 to —10_____ 14 45 
—29 to —20____-_ 2 . 
—39 to —30____- 1 


A study of the relative difficulty of the 
several blocks yields no basis for an explana- 
tion. The blocks numbered from one to six 











208 JOURNAL OF EXPERIMENTAL EDUCATION 


were identified with the appropriate drawings 
in 45, 42, 44, 48, 41, and 45% of the times 
they were tried, respectively, and each ap- 
peared 24 times among the 144 drawings 
arranged in random order in the testing pro- 
cedure. There was no indication of any 
significant differences in difficulty among 
either the blocks or the drawings. The random 
order was itself free from any discernable 
bias. 

There is another possible explanation for 
the low correlations which follows from the 
observation that 39° or more of the group 
were not able to improve their scores on the 
second trial. A test of this kind can usually 
be performed by insight, by systematic trial 
and error, and by random trial and error. 
It is not unusual to observe an examinee 
starting a test with random trials, then 
moving on to systematic trials, and finally 
arriving at such insight that selection of the 
pieces is made from perceptual cues without 
trial manipulations of the pieces themselves. 
Where this order of procedure is observed it 
implies a defect in the test itself, in the 
giving of instructions, or in both. The effect 
of the failure to arrive at insight is to impair 
both the reliabilty and the validity of the 
test. This seems to be the case in our 
experience with the McCauley test. 


Prof. Mann has said that the test appears 
to be too difficult for Freshmen students. Our 
observation is that the instructions are not 
comprehended by many of the students at 
the end of the ten-minute period allowed for 
reading these directions and trying out four 
of the items. The obvious line of improve- 
ment here is to lengthen the practice period 
and to simplify the directions, making sure 
that every examinee understands the relation- 
ships between the drawings and the cor- 
responding tetrahedrons. 

However, this may not be enough. The 
initial difficulty of perceiving the relationships 
between drawing and solid may still remain, 
since a tetrahedron is such a complex figure 
that twenty-four drawings are made from it 
without including oblique views. It would 
seem to be better, in designing tests for three- 
dimensional perception, to begin with very 
simple solids well within the perceptual grasp 
of the most poorly endowed examinee. The 
successive solids might then progress in order 
of complexity with fewer alternatives from 
which the examinee must choose. 





(Vol. 7, No. ; 


In summary, it seems to us that the 
McCauley test in its present form has these 
defects: 


1. Its items do not have a progressive order 
of difficulty. 

2. It is often begun before the examinee 
comprehends the task—before he has 
gained insight. 

3. It begins at a level of difficulty much 
too high for many to whom it is given. 


4. It offers too many alternatives for effec- 
tive trial-and-error performance. 


5. The time limit should be increased to 
an optimum experimentally established. 


To remedy these defects it will be neces- 
sary to redesign the test, probably embodying 
some of the tetrahedrons in a longer series 
of solids. The principles to guide such design, 
also applicable to the original design of any 
similar test, follow from the above list of 
defects: 


1. It should have a series of items gradu- 
ated in a progressive order of difficulty from 
very easy to very difficult. 

2. Preliminary instructions and adequate 
practice in a trial period must continue until 
the examinee demonstrates by several suc- 
cessful performances that he comprehends 
the task. 

3. Its first and easiest items should be well 
within the ability of the most poorly en- 
dowed examinee to whom it is given. Con- 
versely, its most difficult items should tax 
the ability of the most highly endowed 
examinee. 

4. The alternatives for each choice should 
be limited to three or four, as among four 
rectangular solids but not counting such 
obvious incongruities as triangular pyramids, 
frustrums of cones, and similar solids which 
might also be items in the test. 

5. The maximum time limit should allow 
not more than three persors among one 
thousand to complete the whole test. 

Prior experimentation by one of the present 
writers® with perception testing in two and 
three dimensions supports the observation of 
McCauley and Mann that such tests give 
results poorly correlated with intelligence tes! 


* Drake, C. A., Inspection for Inspectors, America 
Machinist, 82:17: 766-768, August 24, 1938. 











March, 1939| 


scores, and thus seem to be testing a function 
that is different from general intelligence as 
measured by the usual tests. Results from 
the use of such tests in selecting inspectors 
for work in factories indicate that intelligence 
may be disregarded as a factor in such se- 
lection, except, possibly, in a few cases in 
which very high intelligence is known to 
contribute to early and rapid labor turnover. 
However, in drawing such conclusions, one 
must be on his guard against the confusion 
that ensues from applying labels to the func- 


McCAULEY TETRAHEDRON TEST 


209 


tions that seem to be inferred from the tests. 
It should be borne in mind that Spearman 
contends that tests of perception are really 
tests of intelligence and that the usual 
intelligence tests are not. 

Perception testing is a fertile field for 
experimentation. The need is for more and 
better tests and for experimenters who can 
approach their task with reasonable freedom 
from the conventional notion that perception 
is nothing but a function of general intelli- 
gence as usually measured. 














THE RELATIONSHIP OF SELF-RATING AND CLASSMATE 
RATING ON PERSONALITY TRAITS 


MARGARET J. DRAKE, SYDNEY ROSLOW, AND GEORGE Kk. BENNETT! 
. New York City 


The difficulty of determining the validity 
of personality tests has long been recognized. 
In the case of measures of abnormal tenden- 
cies the standardization has often been based 
on the discrimination between “normals” and 
those displaying the given syndrome.’ This 
method is less appropriate for scales designed 
to measure traits which are not parallel to 
some psychiatric classification. The question- 
naire used in this study, An Inventory of 
Activities and Interests,* is an example of the 
non-psychiatric personality measure. This in- 
ventory purports to measure social initiative, 
self-determination, financial resourcefulness 
(“economic self-determination”), and adjust- 
ment to the opposite sex, as well as a com- 
bination of these, denoted as “overall per- 
sonality." The author of this inventory, 
Henry C. Link, defines personality as “the 
extent to which the individual has learned to 
convert his energies into habits and skills 
which interest and serve other people.’’* The 
degree to which Link’s hypothesis is sub- 
stantiated by this inventory should be de- 
terminable by a comparison of the scores on 
the several scales (self-ratings) with ratings 
by intimate acquaintances. Two previous 
studies, one by the author, reported in the 
manual accompanying the form, and the other 
by an independent investigator, W. A. 
Thompson,’ have demonstrated the existence 
of some correspondence between scores and 
external evaluations of personality. Link used 
as a criterion group possessing effective per- 
sonality, teachers’ selection of pupils regarded 


‘The psychometric data were obtained by Mrs. Drake, 
Chairman, Guidance Bureau, James Monroe High School. The 
experiment was planned and directed by Bennett and executed 
by Roslow. The authors are indebted to Dr. Henry E. Hein, 
Principal of the James Monroe High School for his coopera- 
tion in this study. 

7 Among the questionnaires so validated are the Psycho- 
Somatic Inventory, Ross A. McFarland and Clifford P. Seitz, 
“A Psycho-Somatic Inventory,”’ Journal of Applied Psychology, 
22, 1938, 327-339, and the Humm Wadsworth Temperament 
Seale, American Journal of Psychiatry, 92, P 163 ff. 

* Also known as the P.Q. or Personality Quotient Test, by 
Roney C. Link, The Psychological Corporation, New York. 
1938. 

* Manual for the P.Q. (1938 revision) op. cit. 

* William A. Thompson, “An Evaluation of the P.Q. (Per- 
a, Quotient) Test,’ Character and Personality. 1938, 6, 


as leaders by their classmates. High scores in 
social initiative and overall personality were 
found to be characteristic of this group. 
Thompson reports that the selection by the 
deans of those children possessing the most 
and least amount of each trait agreed essenti- 
ally with the test results. In both these 
investigations the ratings involved were those 
of teachers. 


PROBLEM 


In this experiment an attempt is made to 
compare the individual’s score on each scale 
of the Inventory of Activities and Interests 
with the ratings of coeval associates. If the 
inventory measures the possession of the 
habits of personality along the axes of its 
several scales, there should be a positive re- 
lationship between the responses of the 
individual and the judgments of his fellows. 
In this instance, self-ratings were determined 
from the responses to the items of the ques- 
tionnaire and classmate ratings from the 
composite judgment of other members of the 
class. 

SUBJECTS 


The subjects consisted of the members of 
three honor classes at the James Monroe 
High School, New York City. The students 
in these classes represent a superior high 
school population (mean I.Q. 127). No selec- 
tion factor other than chance determined the 
division of the students among these three 
sections. The enrollment in these sections 
included 70 boys and 44 girls in the tenth 
grade. 

The population of these classes was 
unusually stable. Each pupil had been a 
member of his present section for at least 
one and one-half school years. Because of the 
wide area of the city from which these 
classes are drawn, association outside of 
school is infrequent. 


METHOD 


To obtain the ratings of classmates for 
each individual a question was prepared de- 


210 








March, 1939] 


signed to epitomize the characteristic sup- 
posedly measured by each scale of the 
inventory. These questions and designations 
of the corresponding traits are listed below: 


Question 


. Is the person named more popular than 
most others in the class? 

2. Is the person named more friendly and 
sociable than most others in the class? 

. Does the person named know what he wants 
to do next better than most others in the 
class? 

. Is the person named more interested in 
earning or saving money than most others 
in the class? 

5. Does the person named get along with mem- 
bers of the opposite sex better than the 
average of the class? 

Trait 
Overall ee which the 
Personality Quotient is determined. 
Social Initiative. 
Self-determination. 


Economic Self-determination. 
Adjustment to the opposite sex. 


For each class a number of mimeographed 
answer sheets were prepared. Each sheet con- 
tained an alphabetical list of the names of 
the members of the class. Opposite each name 
appeared “yes”, “no”, and “?”. One such 
sheet, numbered to indicate the question put, 
was distributed to each member of the class. 
The students were then given these instruc- 
tions: 

“On this sheet are the names of the mem- 
bers of this class. After each name you 
will see “Yes”, “No”, and a question mark. 
The question to be answered for each 
name, except your own, is: ‘Is the person 
named more popular than most others in 
the class?’ If your answer is Yes, draw a 
circle around Yes. If your answer is No, 
draw a circle around No. If you can’t 
answer Yes or No, draw a circle around the 
question mark. Remember the question is: 
“Is the person named more popular than 
most others in the class?’”’ (Question 1) 


At this point the administrator wrote the 
question on the black board. When the stu- 
dents had completed answering this question 
for each name listed, the answer sheets were 
collected. A second sheet was distributed and 
the instructions were repeated for the second 
question. This procedure was continued for 
the remaining three questions. 

A cross tabulation by question was then 
made of the frequency of “yes”, “no’’, and 


RATING ON PERSONALITY TRAITS 211 


“?” responses for each name. Arbitrary 
weights of 2, 0, and 1 were assigned respec- 
tively to these responses and a “rating 
score” was obtained for each individual for 
each question. 

During the next week the Inventory of 
Activities and Interests was administered by 
the guidance counsellor of the school who had 
also supervised the classmate rating program. 
No mention was made to the pupils of any 
possible connection between ratings and 
questionnaire, nor were any results made 
available to the students. The completed 
inventories were scored and checked for each 
scale. 


RESULTS 


In each of two sections, 38 pupils were 
present during the session in which the ratings 
of classmates were made. In the third section 
32 were present, although the enrollment was 
also 38. Since each pupil answered each ques- 
tion in respect to every member of the class 
except himself, for 76 of the cases classmate 
ratings were made by 37 individuals. Of the 
remaining 38 cases, 32 were rated by 31 class- 
mates and 6 by 32. The rating totals for the 
members of the third group were mutiplied 
by appropriate constants to compensate for 
the smaller number of raters. 

Consistency of the ratings within each class 
for each question was determined by a divi- 
sion of the raters into two equal groups and 
the computation of the correlation between 
the two series of total ratings. These coeffi- 
cients, corrected by the Spearman—Brown 
formula for the full number of raters and 
averaged by Fisher’s z function, are given in 
Table I. 


TABLE I 


THE CONSISTENCY OF RATINGS OF 114 INDI- 
VIDUALS BY THEIR CLASSMATES (31 TO 37 
RATERS FOR EACH INDIVIDUAL) ON EACH OF 
THE QUESTIONS 


Question ree 

. Is the person named more popular than 
most others in the class?__--_-------- .95 

. Is the person named more friendly or 
sociable than most others in the class? .83 

. Does the person named know what he 

wants to do next better than most 
others in the class?_........--------. 85 

. Is the person named more interested in 

earning or saving money than most 
others in the class?.._................. .79 

. Does the person named get along with 

members of the opposite sex better than 
the average of the class?_____.__----- .93 








212 JOURNAL OF EXPERIMENTAL EDUCATION 


The consistency of these ratings appears to 
be reasonbly satisfactory with the possible 
exception of question 4, dealing with earning 
and saving money. Lower consistency for this 
question may perhaps be explained by the 
paucity of extra-school association among 
these students. Financial astuteness_ will 
probably find relatively less opportunity for 
display within the school than will the other 
traits. 

The correlation between ratings of the 
several questions has been determined for two 
of the sections. These coefficients are pre- 
sented in Table IT. 


TABLE II* 


AVERAGE COEFFICIENTS OF CORRELATION (FISH- 
ER’s z) BETWEEN COMPOSITE RATINGS OF 76 
STUDENTS OF BoTtHu SEXES 


Question 2 3 4 5 
SS eee ey oOo 2 246 *® 
(a 48 .20~ .73 
7, chssiuiidienaiinianteadannes 48 «44 
| eee er ee —.12 


*It may be remarked in passing that this 
pattern of coefficients is similar to that given 
by the coefficients between scales in the nianual 
for the Inventory of Activities and Interests 
(Link, op. cit. p. 10). 





(Vol. 7, No. ; 


At the time the Inventory of Activities 
and Interests was administered several of the 
students were absent, reducing the total num- 
ber to 98. Separate parallel forms of the 
questionnaire are used for boys and girls. 
Since scores from these forms are not directly 
comparable, the sexes are treated inde- 
pendently from this point on. 

Means and standard deviations of inventory 
scores for the experimental group are con- 
trasted in Table III with similar data for 
comparable grades reported in the manual. 
Significantly higher mean scores are obtained 
by both sexes of the experimental group in 
Self-Determination and by girls in Overall 
Personality (P.Q.). 

It seems probable that these differences are 
unimportant for the present experiment 
especially in view of Thompson’s statement’ 
that persons with high P.Q. Test scores tend 
to have a slight advantage in academic compe- 
tition and the fact that the norms reported 
in the manual are based on both oth and roth 
grades. 

Table IV shows the coefficients of correla- 
tion between the score on each scale of the 
Inventory of Activities and Interests and the 

* Thompson, op. cit. 


TABLE III 


MEANS AND STANDARD DEVIATIONS OF EXPERIMENTAL AND STANDARDIZATION 
GROUPS FOR BOTH SEXES 


Boys Girls 
Group Experimental Standardization Experimental Standardization 
Seale Ms S.D. M $.D M_ S.«. 
Se 103.3 18.0 100.0 17.0 113.4 14.3 100.0 17.0 
ih atticnnnacnces Ge wee 67.4 15.4 66.6 11.9 60.8 13.0 
ee 65.8 13.2 59.2 12.8 72.0 11.2 62.5 12.5 
Sa 34.8 9.6 36.3 8.4 34.3 6.0 34.7 6.8 
ESTE Oe 21.0 7.2 22.4 7.0 31.2 5.8 26.8 6.6 
N = 55 N = 462 N = 43 N = 430 
TABLE IV 


COEFFICIENTS OF CORRELATION BETWEEN SCORES ON EACH SCALE OF THE INVENTORY AND 
COMPOSITE RATINGS ON THE CORRESPONDING QUESTION 


Boys, N = 55 
QivsP.Q. Q2vsSI. Q3vsS.D. Q4vs E.S.D. Q.5 vs S.X. 

a ene eee ee .60 45 51 14 .65 
LE Sa ee 62 49 55 16 .67 
SPRIINII dnssinciihesteaintnaasnicenibaaadans .06 07 .06 .09 05 

Girls, N = 48 
oe cen 45 Al 35 19 55 
I ict cltcaditineintian 46 45 38 21 57 
_, | Se ee .08 .08 .09 10 07 


* Corrected for unreliability of criterion only. [ric = [Tie/ (rec) %]. 














March, 1939] 


composite rating by classmates on _ the 
corresponding question. 

It will be observed that significant positive 
relationship exists between each scale and the 
corresponding question with the exception of 
Scale E.S.D. and Question 4. There is also 
an apparent tendency for the coefficients to 
be higher for the boys than for the girls. 

If these coefficients were obtained between 
the scales and some more usual criteria it 
would be necessary to conclude that only 
slight correspondence existed between the 
scores and external measures of these traits. 
However, two factors peculiar to the present 
technique probably reduce the validity of the 
criterion. First, it is obviously impossible to 
convey in one sentence to a relatively naive 
individual the essential characteristics of a 
trait. Second, although the composite ratings 
reflect the opinions of many judges, the 
response of each judge is only the answer to 
a single question. The extent of correlation 
between one question and any other variable 
is necessarily restricted by its low discrimina- 
tory capacity. 

The lack of significant correlation between 
Question 4 and Scale E.S.D. was not entirely 
unexpected. It has already been mentioned 
that the pupils were poorly acquainted with 
each other outside of school. The observa- 
tions of classmate behavior with respect to 
economic resourcefulness could for the most 
part have been made only in extra-school 
activities. There are two possible explana- 
tions of the lower correlation between in- 


RATING ON PERSONALITY TRAITS 213 


ventory scores and classmates’ ratings for the 
girls. In the first place the higher mean 
scores and smaller standard deviations (as 
shown in Table III) suggest that the girls 
may be a more select group with a restricted 
range. In the second place the rating of girls’ 
personality may be less accurate because, as 
Thompson has suggested,’ their behavior is 
less overt than the behavior of boys. 


SUMMARY AND CONCLUSIONS 


A comparison was made between scores 
on each scale of the Inventory of Activities 
and Interests and classmate ratings on ques- 
tions designed to express the central aspect of 
each trait. Fifty-five boys and forty-three girls 
served as subjects for both ratings and 
questionnaire. The consistency of composite 
ratings by classmates ranged from .79 to .g5. 


The correlation between self-rating by 
means of the inventory and the rating of 
classmates for four of the five traits ranged 
between .49 and .67 for boys and between 
.38 and .57 for girls. The fifth trait, economic 
self-determination, was less closely related to 
classmates’ opinion for both sexes. This result 
is believed to be a function of the lack of 
extra-school association among the subjects. 


Evidence is given that significant positive 
correlation exists between four scales of the 
P.Q. Test and the judgments of classmates. 
The nature of the criterion precludes the 
usual interpretation of these coefficients. 

™ Thompson, op. cit. 








~ ws 2 





A STUDY OF THE VALIDITY OF SOME CARDIO- 
VASCULAR TESTS** 


LEONARD A. LARSON 


Spring field College 
Springfield, Mass. 


INTRODUCTION 


Since 1900 many cardio-vascular tests 
have been developed. These tests were the 
natural result of the rapid advance in the 
knowledge of the physiology of the 
circulatory-respiratory systems. The purpose 
in the development of these tests was to 
secure information as to the functional effi- 
ciency of the circulatory-respiratory systems. 
The test items used during this period are 
the normal values for systolic, diastolic, and 
pulse pressures; normal breath-holding abil- 
ity, and after-exercise breath-holding ability; 
the reaction of the circulatory-respiratory 
systems to graded exercise. The test items 
have been combined into test batteries by use 
of clinical judgement and statistical methods, 
to give more meaningful physiological in- 
formation than that given by a single test 
item. 

Many physiologists have questioned the 
validity of cardio-vascular tests. Many have 
raised questions about the relationship be- 
tween the various tests. Are they specific 
tests, or do they all give information as to 
the general body fitness? Do they indicate 
physiological changes in training and illness? 
It is the purpose of this research to attempt 
to answer these questions. 


Purpose of Research 


The purpose of this research is fourfold: 
(1) to determine the consistency between 
cardio-vascular tests in grading physiological 
efficiency, (2) to determine the significance 
of the various selected tests in indicating the 
physiological changes as the result of train- 
ing or illness, (3) to determine the validity 
of the eleven selected tests, and (4) to com- 
bine significant test items into the best test 
battery as an indicator of physiological 
efficiency. 

* An abstract of a dissertation submitted in partial fulfill- 
ment of the uirements for the Ph. D. degree in the School 
of Education, New York University, 1938. 

? This is a continuation of a series of researches conducted 


by Dr. J. H. McCurdy and L. A. Larson in Cardio-vascular 
ciency. 


Statement of Problems 

The research includes the application of 
eleven cardio-vascular tests to four typical 
physiological groups of subjects of the college 
age range (seventeen to twenty-four). Two 
of these groups—varsity (60) and Olympic 
(40) swimmers, and Springfield College 
freshmen (500)—are typically efficient or 
“Good” in respect to cardio-vascular effici- 
ency. Approximately three-fourths -of the 
freshman group are physical education stu- 
dents. The third group is represented by one 
hundred thirty eight Springfield College 
infirmary patients who had “Poor” cardio- 
vascular efficiency. These subjects were in 
bed for two or more days with temperature 
resulting from some organic ailment, such as 
colds, grippe, etc. The examinations were 
made when body temperature was normal. 
The case studies constitute the fourth group. 
Two groups of subjects are included in the 
case study analysis; the first consists of 
twenty-seven subjects examined in the fall, 
and again after six weeks of training for 
swimming; the second consists of seventy- 
three subjects examined in the fall, and again 
after confinement to the infirmary for two or 
more days. These four groups of subjects— 
varsity and Olympic swimmers, infirmary, 
college freshmen, and case study—are used 
to answer the four questions suggested as the 
purposes of this research. 

The study is limited to male students of 
the college age range from seventeen to 
twenty-four. Only organically sound subjects 
were included in the four groups. Those 
with heart defects which affect function were 
eliminated. The “Poor” groups differed from 
the “Good” groups in some functional change 
in the circulatory-respiratory systems. 


EXPERIMENTAL PROCEDURE 


Eleven cardio-vascular tests were selected 
for the examination of the four groups of 
subjects. These tests were: McCloy Test, 
Barach Test, Stone Test, Tigerstedt Test, 


214 











March, 1939] 


Basal Metabolic Test, Difference between 
Standing and Horizontal Pulse Rate Test, 
Difference between Standing and Horizontal 
Systolic Blood Pressure Test, Pulse Pressure 
times Pulse Rate Test, Pulse Pressure times 
Pulse Rate divided by Diastolic Pressure 
Test, Crampton Blood Ptosis, and the 
McCurdy—Larson Organic Efficiency Test. 
The organization of the groups for statis- 
tical treatment was as follows: (1) the fresh- 
man group of approximately 450 subjects 
(“Good” group) was used to indicate the 
consistency between the eleven tests in classi- 
fying these subjects’, (2) the efficient groups 
(““Good”’—athletic, “Good”—freshmen, and 
‘Good”—athletic and freshmen) were com- 
pared with the inefficient group (‘“Poor’— 
infirmary patients) to determine the validity 
of the various cardio-vascular tests; (3) the 
results of the examination on twenty-seven 
subjects in the fall were compared with those 
in the swimming examination after six weeks 
training, also the results of the fall examina- 
tion of seventy-three subjects were compared 
with the results of the infirmary examination 
of these subjects after they had spent two or 
more days in bed with increased temperature; 
and (4) the significant test items were de- 
termined by using the athletic group (varsity 


' The original group consisted of 460 subjects. All the meas- 
urements were not secured for every subject. however; and 
some questionable measurements were discarded. The range of 
subjects is therefore from 460 to 308 (see Table I). 


VALIDITY OF CARDIOVASCULAR TESTS 


215 


and Olympic swimmers) and the infirmary 
patients as the criteria. 

The statistical techniques used, with the 
exception of the product-moment correlation 
and the multiple factor analysis to determine 
test consistency, were those which express the 
significance of the difference between groups. 
These are the mean with the standard error, 
the difference in the means with the standard 
error of the difference, the critical ratio, and 
the bi-serial correlation. 

The data for the study were secured at 
Springfield College and Yale University. The 
freshmen, varsity swimmers, and infirmary 
groups were Springfield College students. The 
Olympic subjects were examined while in 
training at Yale University in preparation 
for the Olympic Games in Germany. The 
Olympic swimmers were under the direction 
of Coach R. J. H. Kiphuth of Yale 
University.’ 


CONSISTENCY OF CARDIO-VASCULAR 
TESTS IN CLASSIFICATION 


The purpose of a cardio-vascular test is 
to discover the unfit and to determine the 
classification of the physiologically fit. The 
eleven cardio-vascular tests should therefore 


1 The writer wishes to express appreciation to Coach R. J. H. 
Kiphuth for his cooperation and encouragement in this experi- 
mental work. The examinations were made by Dr. J. H. 
McCurdy and L. A. Larson just before the athletes sailed for 
Germany. 


TABLE I 
CONSISTENCY OF TESTS IN INDICATING CARDIO-VASCULAR EFFICIENCY’ 


Basal 


Diff. in 


Diff. in P. P. PPXPR. Cramp- 


Tiger- 

Tests Barach Stone Metab. stedt P. R. Systolic XP.R. + Dias. ton ff. 
—.4628 — -- 2856 —.5844 —.2651  —.3704 .8044 —.4212 —.5172 4166 —.1654 
McCloy * 025 * .029 * .021 + .029 * .030 * .029 + .026 + .023 * 029 028 
(454) (454) (460) (454) (375) (429) (454) (454) (428) (545) 

—.0044 .7202 —.0133 4131 1007 4276 2644 —.0405 —.3851 
Barach * .031 * 015 + .033 * .030 * .032 + .026 * .030 * .033 * 024 
(456) (460) (429) (336) (428) (455) (429) (426) (592) 
.4342 .68038 —.1838 —.0525 .5560 6125 0178 .6473 

Stone * .026 + .017 * .033 * 032 * 022 * 020 * 032 * 016 
(455) (434) (388) (432) (453) (456) (437) (593) 
Basal .6079 .2596 0991 8925 8126 0668 .2177 
Metabolic * .020 + .030 * .032 + .007 * 011 * 032 * 030 
(459) (433) (434) (458) (459) (433) (457) 
—.2192 1254 8547 7815 2944 . 2825 
Tigerstedt * .035 + .032 + .009 + 013 * .030 + .026 
(359) (428) (459) (428) (427) (593) 
Diff. In Hor.— —.1969 —.0140 5656 —.2466 
Std. P.R. * .035 + .035 * .038 * .023 * 031 
(387) (379) (308) (394) (428) 
Diff. in Hor.— E — .0789 .6941 —.0497 
Std. Systolic * 032 * .032 * 017 * .032 
(424) (426) (427) (427) 
.9128 047 0392 
PP X PR * 005 * .032 * .028 
(456) (427) (594) 
0607 2050 
PP x PR * 0832 * 027 
+ Diastolic (427) (598) 
0770 
Crampton * 082 
(427) 


1 Content of Cells: 


(I) Correlation, (2) Probable Error, and (@) No. of cases. 








216 


TABLE II 
ANALYSIS OF CARDIO-VASCULAR TESTs 


Case Studies 


Fall Swimming 
N =27 


Factor Analysis 


Validity: Criteria 


Rotated Factor Loadings 


Fresh- 
men and Swim. 
(600 


“Good” 


Fresh- 
(500) 
Infirm. 


men 


“Good” 
“Poor” 


er 
Infirm. 


(138) 


“Good” Swimm 
(100 
“Poor” 


Fali—Infirmary 
=73 


(138) 


Com- 


munality 


Unique- 


ness 
1-h? 


h? 


6 


Ill 


rbis? 


sg “C.R! thi 


rbis.? 


C.Rs 


Tests 


JOURNAL OF EXPERIMENTAL EDUCATION 


“-NO-S 


769 
7799 
79 
.5130 
6454 
1.00105 
9039 
703 
144 


. 762 
1.04143 
88 


.1970 


2526 —.1848 
5441 
4020 
.4556 


. 5032 
-.3873 —.5432 — 
4214 


7 
9239 — 
8200 


.5708 


ee eS eS aaa ew 


SESE: ree 
eee ees ae 
a 
aug ie 
BE Retin f: 
ia $5 Sac Se 
all 
ote aa: 

yiidsssice! 


M:—M, 


aoD 


* Bi-Serial Correlation: 


* Critical Ratio 


VPq 
* Value greater than one is theoretically impossible. It is the result of approximating communalities in diagonals. 





rie. = M:,—M, 





(Vol. 7, No. ; 


agree in the separation of the efficient subjects 
from the inefficient. The eleven test scores, 
secured on approximately 450 college fresh- 
men, or the “Good” group, were intercor- 
related to determine the degree of consistency. 
These correlations are presented in Table I. 
Only four relationships (Pulse Pressure 
Pulse Rate Test with Basal Metabolic Test, 
Pulse Pressure Pulse Rate Test with Tiger- 
stedt Test, Pulse Pressure Pulse Rate 
divided by Diastolic Test with Basal Meta- 
bolic Test, and Pulse Pressure Pulse Rate 
divided by Diastolic Test with Pulse Pres- 
sure < Pulse Rate Test) out of 55 reached 
or exceeded the .80 standard of significant 
consistency.’. 

Two reasons for these relationships and 
the lack of relationship between the eleven 
cardio-vascular tests can be advanced. The 
first is that the tests are specific in their 
indication of some function of the circulatory- 
respiratory system, and these indicators are 
unrelated to other functions of the same 
systems. The second is that the tests lack 
validity, and therefore do not agree in classi- 
fication. The first reason is analyzed by use 
of Thurstone’s Multiple Factor Method; the 


second by establishing criteria for test 
validity determination.’ 
The eleven cardio-vascular tests are 


described by four factors. The Basal Meta- 
bolic, Tigerstedt, Pulse Pressure Pulse 
Rate, and Pulse Pressure Pulse Rate 
divided by Diastolic Tests correlate highly 
(above .82) with factor one. This shows that 
factor one is circulation resistance as indicated 
by diastolic pressure.’ 

In factor two, two highly significant 
correlations were found: Difference between 
Standing and Horizontal Pulse Rate Test, 
and Crampton’s Blood Ptosis Test. The factor 
was therefore identified as splanchnic vaso- 
motor efficiency as indicated by the relation- 
ship between systolic pressure and Pulse rate 
in the standing position as compared to the 
horizontal. 

Four tests correlate significantly with 
factor three. These were: Difference in 
Systolics, Barach, Crampton, and Organic 
Efficiency Tests. After analyzing the tests it 


1See Table I. 

2See Table II for results of factor analysis and validity 
coefficients. 

1 Rotated factor hating determined by the calculation 
method yield only one of a number of possible solutions. 
Other values may be secured by use of the graphic method; 
however, in this problem the identifications were possible 
using the calculation method. 





March, 1939] 


was concluded that the factor is heart energy 
during systole in excess of diastolic pressure. 


In factor four, the most significant correla- 
tion was found in the Organic Efficiency Test. 
[his seems to indicate that the factor is 
respiratory efficiency. 

These four factors describe the eleven tests 
to a range of from 51.30 percent to roo per- 
sent; the percentage of uniqueness of the tests 
ranges from 48.70 to o percent. 


PHYSIOLOGICAL CHANGES IN TRAINING 
AND ILLNESS 


The purpose in the case study was to 
determine which of the eleven tests were 
significant in indicating the physiological 
changes in training and in illness. To de- 
termine the effects of training, the fall exam- 
inations of twenty-seven subjects were 
compared with the swimming examinations 
after six weeks training for varsity swimming 
competition; to determine the effects of ill- 
ness, the fall examinations were compared 
with the infirmary examinations of seventy- 
three subjects after being in bed for two or 
more days with some organic ailment. 


Only two tests of the eleven have signifi- 
cant differences between the means in the 
training group’ (Organic Efficiency Test and 
Barach Test). Four tests of the eleven 
showed a high degree of significance between 
the means of the fall and infirmary examin- 
ations' (Organic Efficiency Test, Stone Test, 
Tigerstedt Test, and Pulse Pressure « Pulse 
Rate divided by Diastolic Test). 


See Table II for complete results. 


VALIDITY OF CARDIOVASCULAR TESTS 


te 
+ 
~ 


VALIDITY OF CARDIO-VASCULAR TESTS 


To determine the validity of the eleven 
cardio-vascular tests, three typical physiolog- 
ical groups were used as the criteria: ““Good” 
(varsity and Olympic swimmers) and “Poor” 
(infirmary); “Good” (college freshmen) and 
“Poor” (infirmary); and “Good” (swimmers 
and freshmen) and “Poor” (infirmary). The 
statistical methods described under ‘“Expe:i- 
mental Procedure”’ were used to determine the 
significance of the test differences between 
the “Good” and “Poor” groups.’ The order 
of significance in terms of the three criteria 
is: Organic Efficiency Test, Stone Test, and 
Tigerstedt Test. The remaining tests have at 
most a slight degree of significance. 


RELIABILITY AND OBJECTIVITY 


The reliability of the three significant 
tests and of al! test items was determined 
by Larson, using 21 subjects. The reliability 
coefficients ranged from .6708 to .9740. The 
objectivity of all test items and the two most 
significant tests was determined using student 
examiners. The correlations of objectivity 
ranged from .4150 to .8812, (Table III). The 
experimental conditions for this experiment 
however, were not satisfactory. The exam- 
iners were not experienced and they were 
constantly hurried in their measurements. 
These experimental conditions could only 
lead to greater fluctuations in the test scores. 


COMBINATION OF SIGNIFICANT TEST ITEMS 


The purpose in the combination of the test 
items was to develop, if possible, a cardio- 
vascular test which has a higher degree of 


TABLE III 
RELIABILITY AND OBJECTIVITY OF PHYSIOLOGICAL MEASURES 
Reliability Objectivity 
Examiner—Larson Examiner—students 

Tests 21 subjects 180 subjects 
I csttirciliciphiinicienscininirernntaiditnemetiniiants een . 9715 -7183 
ENS ELT .9599 .8790 
eee .9536 -7525 
2 eee : .9451 .4150 
fon, ene Pressure ..... 8963 .6886 
ee ee eee .9466 .6646 
SD nnnninemnquniamunieimnenseoes é 8232 6213 
pS ER Ae .9740 8812 
Breath-Holding after Ex. _......_..._....._-_.___--_- .7428 .6470 
Standing Pulse Rate minus P.R. 2 mins. after Ex.___- .8320 6111 

Short Organic Efficiency Test _.__.___.______________- 7954 

CT ET .6708 .5018 
.8878 .4750 


see aN I eS AAS 





_ 


218 JOURNAL OF EXPERIMENTAL EDUCATION 


validity than any of the eleven tests in this 
research. The significant test items were de- 
termined by use of the “Good” group (varsity 
and Olympic swimmers) and the “Poor” group 
(college infirmary patients). Using these two 
typical physiological groups nine significant 
test items were found.’ The multiple correla- 
tion procedure was used with the good-poor 
criteria to determine the most efficient test 
battery. Ten physiological combinations were 
made for multiple correlation calculations. A 
short test consisting of three items, (sitting 
diastolic pressure, breath-holding after exer- 
cise, and standing pulse pressure), with a 
multiple correlation of .7501 was devised. It 
has a higher degree of validity than any other 
test in the study, except the McCurdy— 
Larson Organic Efficiency Test; its validity 
coefficient being .7216 as compared to .7913 
for the Organic Efficiency Test. 


The three test items were placed on a T 
scale, weighed by the beta value, and then 
combined into a composite score. The classi- 


*See Table IV for statistical results in the development of 
the short test battery. 


TABLE IV 


STATISTICAL RESULTS IN DEVELOPMENT OF A 
SHort BATTERY OF ORGANIC EFFICIENCY 


CRITERIA 
(a) “Good”. Olympic Swimmers (40) and 
Varsity Swimmers (60). 
(b) aa.” College Infirmary Patients 
(138). 


SIGNIFICANT CORRELATIONS (Bi-serial)' 
0 = Organie Efficiency (“Good” and “Poor” 


Criteria) 
1 = Sitting Diastolic Pressure _____ —.6310 
2= Sitting Pulse Pressure________ .5427 
3 = Standing Pulse Pressure ______ .3822 
4 = Breath-Holding 20 seconds after 
ee eee .5354 
5 = Vital Capacity ............... .3302 


6 = Pulse Pressure — Diastolic (Sit) .6354 
7 = Pulse Pressure — Systolic (Std) .4355 
8= Pulse Pressure * Pulse Rate 


ROTI oc Ae 8 ag 5 .3738 
9= Pulse Pressure « Pulse Rate 
ARES. SS .4018 
MULTIPLE CORRELATIONS (Good—Poor Criteria) 
Ro. sssssers = .7583 Ro.ses = .7540 
Re.seses = .7566 Reser = .7489 
Ro.ess = .7141 Rosa = .6995 
ed Battery) 
Re.1e = .7837 Ro. = .7186 


* Only the significant correlations are listed. 
The original battery included twenty-six tests. 





[Vol. 7, No. 3 


fication scale is based on 1067 such composite 
scores. The scale is divided into ten divisions 
by the decile values. 


CONCLUSIONS 


1. The eleven cardio-vascular tests selected 
for this research, with the exception of four 
relationships out of a possible 55, are not 
consistent in classifying pupils in functional 
efficiency. The reasons for this lack of con- 
sistency are two: (1) the eleven tests are 
described by four different factors with a 
range of variance of 51.30 percent to I00 
percent; the uniqueness variance ranges from 
48.70 percent to o percent; and (2) the lack 
of validity. Only the McCurdy—Larson Or- 
ganic Efficiency Test reached the .80 standard 
for test validity. The Stone test, however, 
has a fair degree of validity. The best pupil 
classification is given by the Organic 
Efficiency Test. 


(2) The McCurdy—Larson Organic Effi- 
ciency Test is the most significant in indi- 
cating the physiological changes due to 
training and to illness; the degree of signi- 
ficance is high for illness (.6454) and slight 
for training (.2810). 


(3) The McCurdy—Larson Organic Eff- 
ciency Test is the most valid test of cardio- 
vascular efficiency. The Stone and Tigerstedt 
Tests are lower in validity, yet have sig- 
nificant validity. 


(4) The Differences in Pulse Rate Test, 
Pulse Pressure < Pulse Rate Test, and the 
Pulse Pressure X Pulse Rate divided by 
Diastolic Test have slight validity. 


(5) The McCloy Test, Barach Test, 
Basal Metabolic Test, Differences in Systolic 
Test, and Crampton Test are invalid accord- 
ing to the “Good” and “Poor” physiological 
groups used as criteria in this research. 

(6) In order to use a cardio-vascular test 
for individual diagnosis the test must be re- 
peated once with the mean score used as the 
index score. This procedure will increase 
the reliability to a _ significant value. 
(r—=.67 to r=.80) 

(7) The “Short Organic Efficiency Test” 
developed in this study has a higher degree 
of validity than any of the tests in the study 
except the McCurdy—Larson Organic Effi- 
ciency Test. 











March, 1939| 


BIBLIOGRAPHY 


1. Books 


Garrett, H. E. Statistics in Psychology and 
Education, New York: Longman, Green, 
and Co., Second Edition, 1937. 

Lamb, F. W. Introduction to Human Experi- 
mental Physiology, New York: Longman, 
Green, and Company, 1930. 

McCurdy, J. H. The Physiology of Exercise, 
Philadelphia: Lea and Febinger, 1928. 
Rogers, F. R. Physical Capacity Tests in the 
Administration of Physical Education, New 
York: Teachers College Contribution to 
Education, Columbia University, 173, 1926. 

Schneider, E. C. The Physiology of Muscular 
Activity. Philadelphia: W. B. Saunders 
Co., 1933. 

Thurstone, L. L. The Vectors of Mind, 
Chicago: University of Chicago Science 
Series, 1935. 

Thurstone, L. L. The Theory of Multiple 
Factors, Ann Arbor, Michigan; Edwards 
Bros., 1934. 

Thurstone, L. L.A Simplified Multiple 
Factor Method, Chicago, Ill., University of 
Chicago Book Store, 1933. 


2. Articles 


Barach, J. H. The Energy Index (S.D.R.) 
of the Circulatory System, Arch. of Int. 
Med., 24:5 (Nov. 15, 1919), pp. 509-523. 

Barringer, T. B. Studies of the Heart’s Func- 
tional Capacity, Arch. of Int. Med., 20:829 
(1917) 

Carpenter, A. Further Observations on 
Tuttle’s Test for Non-Compensated Heart 
Lesion, Research Quarterly of A. P. E. A. 
VIII:I (March, 1937), pp. 130-132. 

Crampton, C. W. Blood Ptosis, New York 
Medical Journal, (Nov. 8, 1913). 

Crampton, C. W. A Test of Condition, Re- 
print from Medical News, (Sept. 16, 1905) 
pp. 22. 

Crampton, C. W. Blood Pressure, Am. Phys. 
Ed. Rev., XII:2 (June, 1907). 

Erlanger and Hooker, The Johns Hopkins 
Hospital Reports, XII: (1904). 

Foster, W. L. A Test of Physical Efficiency, 
A. P. E. R. X1X:9 (Dec. 1914), pp. 632- 
636. 





VALIDITY OF CARDIOVASCULAR TESTS 


219 


Gales, A. M. Estimation of the Basal Meta- 
bolic Rate from Formula Based on Pulse 
Rate and Pulse Pressure, Lancet, London. 
Vol. I:XXIV (June 13, 1931), pp. 1287- 
1283. 

Henderson, Y. Volume Changes of the Heart, 
Phys. Rev. U1:2 (April, 1923) 

Hunt, C. H. and Pembrey, Tests of Physical 
Efficiency, Part I:Guy’s Hospital Reports, 
71 (1921). 

Karporich, P. V. A. Study of Some Physiolog- 
ical Effects of Golf, Am. Phys. Ed. Rev., 
XXXIII:9 (Nov. 1928) 

Lee, C. N. A Further Study of Tuttle’s Test 
as a Means of Detecting Non-Compensated 
Organic Heart Lesions, Research Quaterly 
of A, P. E. A. VAIL:1 (March, 1937), pp. 
123-129. 

McCloy, C. H. A Cardio-Vascular Rating of 
Present Condition, Ardeitsphysiologie, 
Berlin; 4:2 (March, 1931). 

McCloy, C. H. A Program of Tests and 
Measurements for the Public Schools, 
Journal of Health and Physical Education, 
VI:8 (Oct. 1935) 


McCurdy, J. H. and Larson, L. A. The 
Measurement of Organic Efficiency for the 
Prediction of Physical Condition, Supple- 
ment to Research Quarterly of A. P. E. A., 
VI:2 (May, 1933) 


McCurdy, J. H. and Larson, L. A. The 
Measurement of Organic Efficiency for the 
Prediction of Physical Condition in Con- 
valescent Patients, Research Quarterly of 
A. P. E. A. 1:4 (Dec. 1935) 

McCurdy, J. H. and Larson, L. A. The Reli- 
ability and Objectivity of Blood Pressure 
Measurements, Supplement to Research 
Quarterly of A. P. E. A. V1:2 (May, 1935) 

Meylan, C. L. Twenty Years Progress in 
Tests of Efficiency, Am. Phys. Ed. Rev. 
XVIII:7 (Oct. 1913) pp. 441-445. 

Richardson, M. W. and Stalmaker, J. M. 
A Note on the Use of Bi-Serial R in Test 
Research, Journal of Gen. Psychology, 
VIII:2 (April, 1933), pp. 463-465. 

Schneider, E. C. A Cardio-Vascular Rating as 
a Measure of Physical Fatigue and Effici- 
ency, J. A. M. A. 74:22 (May, 29, 1920), 
pp. 1507-1510. 








220 





Schwartz, L. and Britton, R. H. and Thomp- 
son, L. R. The Effect of Exercise on the 
Physical Condition and Development of 
Adolescent Boys, Washington: U. S. Public 
Health Service, Bulletin 179, (1928). 

Shelton, Ruth. On the Relation of Pulse 
Pressure to Output of the Heart, Jour. of 
Physiol. U. College, London, 55, 1921) 

Smith, B. Blood Pressure Studies of Five 
Hundred Men, J. A. M. A. 71:3 (1918). 

Stone, W. J. The Clinical Significance of 
High and Low Pulse Pressure with Special 


; 


JOURNAL OF EXPERIMENTAL EDUCATION 





[Vol. 7, No. ; 


Reference to Cardiac Load and Overload, 
J. A. M. A. LXI:14 (Oct. 4, 1913), pp. 
1245-1259. 

Turner, A. The Adjustment of Heart Rate 
and Arterial Pressure in Healthy Young 
Women During Prolonged Standing, Am. 
Jour. Physiol. Vol. 81, (1827, pp. 192- 
214. 

Tuttle, A. A. The Use of the Pulse Ratio 
Test for Rating Physical Efficiency, Re- 
search Quarterly of A. P. E. A. 11:2 (May 
1931), Pp. 5-17. 








dad, 


pp. 


ate 
ung 
1m, 
J2— 


atio 
Re- 


AN ANALYSIS OF SOME NEW STATISTICAL METHODS FOR 
SELECTING TEST ITEMS* 


RosBert F. BARRY 
University of Rochester and John Marshall High School 


Any addition to the already long list of 
methods for evaluating test items should be 
defensible on one or more of three grounds: 
greater speed, greater validity, or greater re- 
liability. The time-consuming labor of “best” 
existing methods, makes statistical selection 
of items impractical (10, 11, 13, 15, 18, 19, 
21, 24) for those who should use it most, i.e., 
the public school teachers. 

This paper sets forth a new method, whose 
chief virtues are speed and simplicity; com- 
pares the validity of items so selected with 
the validity of items selected by biserial r, 
and compares the consistency of its evalua- 
tion of items with the consistency of biserial 
r for the same items. 

The method gives an index of discrimina- 
tion which combines two distinct elements: 1) 
positional with respect to criterion cate- 
gories, and 2) quantitative with respect to 
deviations of obtained distributions from a 
standard distribution, category by category. 


DEVELOPMENT OF THE METHOD 


Degrees of ability commonly are expressed 
by the grade categories A, B, C, D, and E. 
In the most valid items, the proportion of 
successes should decrease in that order, 
ABCDE. Thus, proportionately more A 
pupils should pass an item than B pupils, 
more B’s than C’s, more C’s than D’s, and 
more D’s than E’s. 

Pupils must first be assigned to criterion 
categories by some independent objective 
measure. Then for each item, compute per 
cent success for each category. This makes 
it possible to arrange categories in order of 
their per cent success. In Table I this posi- 
tional factor is perfect for both items. How- 
ever, item 2 obviously is superior to item 1, 
since it discriminates by wider margins be- 
tween categories. 

The concepts underlying this positional 
factor, hereafter called D,, are simple. Let 1 
represent perfect order of categories in per 


_ * Abstract of thesis submitted to the University of Rochester 
in partial fulfillment of master’s requirements. Under the 
direction of Dr. Jack W. Dunlap. 


TABLE I 
Item 1 Item 2 
Observed Observed 
Per Cent Per Cent 
Category Success Order Success Order 
RE nc ccd 52 1 100 1 
aa 51 2 75 2 
50 3 50 3 
as 49 4 25 4 
(ae 48 5 0 5 
Dp=1 Dp=1 


cent passed. Let o represent complete re- 
versal of order. Since there are ro possible 
corrections in a 5-step scale, i. e., n(n—1)/2, 
each correction is penalized by subtracting 
one-tenth from perfection, 1. 

For example, if ranking by per cent pass- 
ing in each category gives the order BACDE, 
then there is one inversion. So one correction 
is necessary. Hence, D, is .g. If the order 
is CABDE, two corrections are necessary, 
making D, .8. To break a tie between cate- 
gories would be considered as a_ half 
correction. 

Now consider the second factor, quantita- 
tive discrimination, Dg. Entirely distinct from 
the position of the category, Dg is to 
represent the closeness with which the ob 
tained per cent pass approaches some stan- 
dard per cent success for the categories. 
Hence, we must first determine what standard 
per cents will be used. This involves: 1) the 
population in each category, and 2) the ideal 
spread between categories. Both of these are 
necessary in order to get a measure of the 
quantitative difference between items. 

The following assumptions form the basis 
for determining what percentage of success 
should be considered as “standard” for each 
of the five categories: 


1. That the optimum difficulty of an item 
is 50 per cent (5, 9, 12, 16, 17, 20, 22, 
23, 25). 

2. That the distribution of ability follows 
the normal curve. The proportion in 
each category is unimportant as long as 
it is reasonable. The ratio 10, 20, 40, 


221 











; 
¥ 
it 


JOURNAL OF EXPERIMENTAL EDUCATION 


20, 10 per cent has the distinct advan- 
tage of being reasonable, simple to 
compute, and commonly used. 

. That the per cent passing in each cate- 
gory, (the standards), should be respec- 
tively 95, 80, 50, 20, 5, with an average 
of 50 per cent. 


Table II shows the items of Table I com- 
pared with these so-called standards. 
Column 4 shows for each category of item I 
the difference between the observed per cent 
success and the standard. The summation of 
these differences is 144. In order always to 
express this sum as a decimal, divide it by 
the maximum possible summation of differ- 
ences, which is 400. This gives, then, for 
item 1, the decimal .36. However, desirabil- 
ity, i. e., closeness to the standards, would 
then be represented by a low value. There- 
fore, to have goodness represented by a high 
value, the complement, .64, is used as the 
value of Do. 

Xd 

The formula for Dg then becomes 1— ——— 

400. 

The next task is to combine D, and Dy 
into one value, the index of discrimination, 
I. D. This raises the question of their rela- 
tive importance. It is difficult to obtain a 
criterion for evaluating these two, although 
later Table VI will show that they are of 
fairly equal importance in selecting items in- 
dependently. Symonds (20) has shown that 
the reliability and validity of a test are func- 
tions of the item difficulty. Thurstone (22), 
Richardson (17), Urnbrock (23), Elveback 
(9), Cleeton (5), and Voss (25), have shown 
experimentally that both the reliability and 
the validity of an item decrease as difficulty 


varies in either direction from 50 per cent. 
Plotting the index values against the per cent 
of difficulty shows that the higher index 
values are concentrated between 80 and 20 
per cent difficulty. The shape of the curve 
corresponded roughly with that of Thurstone 
(22) for reliability and with that of Voss (25) 
for validity. 


At present, then, D, and Dg are assumed 
to be equal in weight. Hence, they are com- 
bined by simple multiplication to give an in- 
dex, I.D., which lies between 1 and o. The 
index for item 1 becomes .64. For item 2, 
which is plainly superior, .gs5. 


COMPARING THIS METHOD WITH BISERIAL pr 


Both the index and biserial r were com- 
puted for 69 General Science items using 176 
pupils, against a criterion test of 66 items. 
The 22 “best” items and the 22 “worst” items 
by each method, together with a new crite- 
rion test, then were administered to another 
group of 176 pupils. 

In addition, both indices were computed on 
150 items from an Educational Measurements 
test using 250 college students. Sub-tests con- 
taining various numbers of “best” and 
“worst” items, together with a new criterion 
test of 150 items, then were administered to 
another group of 255 individuals. 


Correlations between sub-tests and the cri- 
terion, for General Science data, are shown in 
Table III. Note that the most desirable com- 
parison is a high correlation for the best 
items, with a low correlation both for the 
worst items and for the best items versus the 
worst items. It can be seen here that the 
items selected by the index are slightly, but 


TABLE II 


Item 1 Item 2 


Standard Observed Observed 
Per Cent Per Cent Per Cent 


Category Success Difference 
52 43 100 


Success Difference 


29 75 

0 50 
29 25 
43 0 


= 144 Xdiffs 
= .36 =d/400 
= .64 De 











March, 1939] 


SELECTION OF TEST ITEMS 


to 
to 


TABLE III 


GENERAL SCIENCE DATA 


THE CORRELATIONS BETWEEN A CRITERION TEST AND VARIOUS COMBINATIONS OF ITEMS 
SELECTED BY THE Two METHODS 


Best 22 ITEMS -_-------- silane timitininiaarii index .699 advantage 
biserial r .685 of index .014 

I nei 06 coitccucaaninieea ennai index 562 advantage 
biserial r .628 of index .066 

oii nig hii cen tsgans enalanrmandl index -550 advantage 
best 22 versus biserial r .594 of index .044 


worst 22 items 


not significantly, more valid than those 
selected by biserial r. 

In Table IV with Educational Measure- 
ments data, an examination of the first four 
pairs of correlations, shows the index to be 
slightly more valid in three cases, and slightly 
less valid in the fourth. In the last two com- 
parisons, the situation is reversed with bi- 
serial r somewhat more valid than the index. 

Since all these differences in correlations 
are so small, the net result of this study is 
that there is no significant difference in the 
validity of the two methods. 

The important practical reason for using 
the index in preference to biserial r is the 
speed of the method. In computing both 
values, the Hollerith tabulating and sorting 
equipment was used. With the index, to count 
the passes per category on 150 items, required 
45 minutes. With biserial r 255 minutes were 
required to obtain the necessary totals for the 
same items. In the operations that followed 
machine tabulation, four times as much time 
was required to compute biserial r as to com- 
pute the index, although the most efficient 
computational methods available were used in 
both cases (7, 8). 


OTHER INVESTIGATIONS PERTAINING TO 
THE INDEX 


Having established that the index equals 
biserial r in validity and excels it in time of 
computation, several pertinent questions 
arise: 


1. Since several investigators have proved 
that both validity and reliability are 
definitely related to difficulty, what is 
the relationship thereto of both the index 
and biserial r? 

2. Since the widespread use of a method 
for selecting test items depends on its 
ease of computation (2,14), is it pos- 
sible that still another method would 
save even more time than the index 
without affecting validity significantly? 

3. Since the index is computed on the as- 
sumption that both D, and Dg have 
selective validity, to what extent do 
they assist in the validity of the index, 
as might be indicated by their respec- 
tive validities in selecting items inde- 
pendently? 

4. Since, to a considerable extent, Dg is 
conditioned by difficulty, what is the 


TABLE IV 
EDUCATIONAL MEASUREMENTS DATA 
THE CORRELATIONS BETWEEN A CRITERION TEST AND VARIOUS COMBINATIONS OF ITEMS SELECTED 








TENET ores ae index -765 advantage 
biserial r_ .756 of index .009 
se ee ae index 473 disadvantage 
biserial r .468 of index .015 
TT asin oa a ceerinshicertsiinmslearinantreuninhagmindgntinenita index -784 advantage 
biserial r_ .771 of index .013 
EE Le a ae ee index .608 advantage 
biserial r .620 of index .012 
CORRELATIONS BETWEEN BEST AND WORST 
8 ES REE ee ee ee ee ne ON index .409 disadvantage 
biserial r .333 of index .076 
a TEE I SP See index 574 disadvantage 
biserial r_ .500 of index .074 








to 
to 
_— 


JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE V 
COMPARISON OF THE CORRELATIONS OF THE FIVE METHODS 


12 items out of 69 22 items out of 69 


Gen. Science Gen. Science Educ. Meas. Educ. Meas. 
Method r Rank Method r Rank Method r Rank Method r Rank 
I. CORRELATION WITH THE CRITERION OF SUB-TESTS OF “BEST” ITEMS 
index __. .624 3 index ___ .699 3 index ___ .765 2 index  _.. .784 3 
bis-r _... .689 5 bis-r _... .685 5 bier .... .106 3 ae. : a 
Dk wasn Me Oo ee 687 4 Saree .736 4 | eS. 820 1 
a 635 2 a eC __ ee 779 #1 Sa 808 2 
"eee 597 4 | ee. ae ee 676 5 -742 5 
II. CORRELATIONS WITH THE CRITERION OF SUB-TESTS OF “Worst” ITEMS 
index ___ .457 2 index _.. .562 1 index ___ .473 3 index ___ .608 1 
bis-r _.... .453 1 bis-r _... .628 3 bis-r _.___ .468 2 bis-r ____ .620 2 
ee 483 3 Tb. cscmmmcs Ce ee 429 1 | owe 635 3 
SS oe Se FC ae 563 4 See -724 4 
Ge «suse se = diff _..... .689 5& ga 655 5 Ge -735 5 
III. INTRA-CORRELATIONS OF ABOVE (BEST VS. WORST) 
index __. .889 2 index __. .550 1 index ___ .409 3 index ___ .574 2 
bis-r _._.. .870 1 bis-r _._.. .594 3 BO nic ae: oe bis-r __.. .600 1 
iD cantina ta 2 BS: sccenticeses 602 4 1 _ ae 3054 2 a .620 3 
ee De «+ wee 8 AT7 4 _ “Eee 690 5 
diff ..... 455 4 Ge u.nnx«. Oe © eee 507 5 a .628 4 

TABLE VI 


validity of Dg alone as compared with 
that of difficulty alone in the selection 
of items? 


Using the same data, all of these questions 
can be answered by extending the study to 
include similar numbers of items selected by 
D, alone, Dg alone and difficulty alone—all 
methods which are simpler to compute. 


This has been done in the manner used pre- 
viously. The result is 40 correlations with the 
criterion, and 20 intra-correlations as shown 
in Table V. Notice that altogether 12 groups 
of items are used: 4 “best” groups, 4 “worst” 
groups, and 4 “intra” groups. This makes 12 
opportunities to compare the correlations of 
the five methods. 


Taking the rankings of the five methods in 
the twelve different groups shown above, re- 
sults in the construction of Table VI. It is 
significant that the index is the only one of 
these methods that was never below third 
place in the rank of its correlation. 


Since the time for computation is an im- 
portant practical consideration, records were 
kept to make possible the comparison shown 
in Table VIT. Since correcting the papers and 
obtaining the raw scores are common to all 
five methods, the following timing began as 
soon as the raw scores were obtained. (Table 
VII). 





25 items out of 150 40 items out of 15) 


SUMMARY OF THE RANKS ATTAINED BY THE 
FIvE METHODS IN THE TWELVE COMPARISONS 


Ranks Total Net 
Method Ist 2nd 3rd 4th 5th Ranks’ Rank 
ese FF SS ee 26 1 
OS 2s 2.2. 2 31 2 
DY salesheseenals 3 1 4 4 0 33 3 
ee rs 8 = @ 38 4 
ae 1 O 0 4 7 52 5 

TABLE VII 


COMPUTATION TIME FOR FIVE METHODS FoR 
EQUAL NUMBER OF ITEMS 


I ed eeh aces ae eines a eet ieee 2 minutes 
ee re ee EA ee 4 minutes 
(OE eee Ear ee 8 minutes 
SS ee 15 minutes 
a eee Ore 60 minutes 


CONSISTENCY WITH WHICH A METHOD 
EVALUATES AN ITEM 


In the literature on item analysis, very 
little appears in regard to the consistency o! 
the coefficients obtained by various methods. 
In comparing validation methods, Barthel- 
mess (3) correlated 25 item coefficients ob- 
tained with 98 pupils with the coefficients on 
the same 25 items obtained with 262 pupils, 
“including the first 98.” The inclusion of the 
original group in the computation of the sec- 
ond group tends to lower the effectiveness o/ 


| Vol. a> \ 2 





— - fe em oe ame ee a ee 


to 








March, 1939] 


her correlations which were from .863 to .548 
for the ten methods she studied. Cook (6) 
mentions the “relative stability of indices” as 
being a factor to consider in future compar- 
isons of validation methods. Abelson (1) 
made an empirical study of the McCall 
method. Using the coefficients on 40 items, 
computed with 68 and 69 pupils respectively, 
he obtained a correlation of .279 + .og9. The 
small number of items produces such a high 
P.E. as to make these findings of little value. 
The small number of cases (68 and 69) still 
further lowers the value of this study. Aside 
from these faults, he failed to compare this 
correlation with what another method might 
have obtained. 

So little has been done in this field that the 
last part of this study is devoted to a com- 
parison of the consistency of measurement of 
the three best methods described earlier in the 
study. This phrase, Consistency of Measure- 
ment, must be distinguished from Stability 
of An Item. There has been some confusion 
in the use of these phrases in the past. Brig- 
ham (4) and Wilson (26) both use “stability 
of the item,” while Cook (6) uses “stability 
of the index.” The writer suggests that here- 
after “consistency” be used only to refer to 
the index, and that “stability” be used only 
to refer to the item. 

It is difficult to separate these two. Brig- 
ham (4) tabulates validity coefficients on 15 
items using two different groups of 500 each. 
He speaks of “the relationship evident be- 
tween the two series of values,” but when they 
are correlated by ranks they yield but .28. 
Even this small degree of relationship cannot 
be ascribed entirely to stability of the items 
until the consistency of measurement of the 
method of computing the coefficients has been 
determined. 

Before attempting an investigation of con- 
sistency, certain conditions must be met: 


1. Use as large a number of objective items 
as possible so as to reduce the P.E. of 
the correlations. 

2. Use two groups each as large as possible 
so as to reduce the element of error in 
the original item indices. 

3. Equate the two groups in both range 
and distribution as nearly as possible. 

4. Have the instructional conditions for 
the two groups as nearly identical as 
possible. 





SELECTION OF TEST ITEMS 22 


nm 


To meet condition $1 and $2, ninth year 
General Science data were used because they 
afforded the maximum number of pupils, who 
took an objective examination. Conditions $2 
and $3 were met in Table VIII by the largest 
possible number of pupils, and in Table IX 
by actually equating the groups on the basis 
of scores on the entire test. To insure instruc- 
tional equality (#4), data for Table VIII were 
all taken from one school, while data for 
Table IX were taken from one school and at 
one examination even at the sacrifice of num- 
bers. Three validation methods were studied: 
biserial r, the index, and D,. 


Since ranking of items is the true purpose 
of item coefficients, and the coefficients them- 
selves are merely the means thereto, one table 
shows the correlations by ranks in addition to 
the correlations by coefficients. 


Three indices (this index, biserial r, and 
D,), were computed twice for 66 items. 
Twenty-four of these items were drawn from 
the January 1938 examination and the in- 
dices were based on 313 students. Forty-two 
of the items were selected from the June 1937 
examination and the indices were based on 
176 subjects. 


Sixty-five of these items reappeared in the 
June 1938 examination, which was given to 
319 subjects. Thus for these 65 items it was 
possible to compute each index on two sepa- 
rate groups. The 66th item was administered 
both in June 1937 and January 1938. 


For each index the paired values were cor- 
related, with the results shown in Table VIII. 
The values were then ranked and the corre- 
lation between the ranks obtained. The values 
computed by the index were superior in con- 
sistency to the values computed by either 
biserial r or D,. 


It was possible to divide the June 1937 
General Science group into two groups, each 
of 176 students, equated as to mean score 
and standard deviation on the entire test. 
The index, biserial r, and D,, were computed 
for 69 items for each group. This eliminates 
variation in instruction from year to year. 
For each method the correlations between the 
paired values are shown in Table IX. 


Examination of Tables VIII and IX indi- 
cate that the index is more consistent than 
biserial r or D,. It might be argued that the 
differences are attributable to chance, in view 
of the small number of items, 66 and 69. 


a a ta 








to 
to 
oO 


TABLE VIII 


CONSISTENCY AS SHOWN BY CORRELATIONS OF 
COEFFICIENTS, 66 ITEMS, 24 OF THEM FROM 
JANUARY 1938, 313 PuUPILs, AND 42 ITEMS 
From JUNE 1937, 176 Pupr~s GENERAL SCI- 
ENCE NINTH YEAR, WITH 66 ITEMS, 1 FROM 
JANUARY 1938, 313 Pupits, AND 65 ITEMS 
FroM JUNE 1938, 319 PUPILS 


Index Biserial r Dp 
Values __.__ .698+ .04 .559+.06 .506 + .06 
Ranks ___. .738 + .04 .521+.06 .518 + .06 
TABLE IX 


CONSISTENCY AS SHOWN BY CORRELATIONS OF 
ITEM COEFFICIENTS, 69 ITEMS JUNE 1937, 


GENERAL SCIENCE EXAMINATION, Two 
EQUATED GROUPS OF 176 EACH 
Index Biserial r Dp 
Values___.7842+ .04 .456+.06 .469 + .06 


However, since the direction of the difference 
remains the same in both tables, this does not 
seem plausible. 


CONCLUSIONS 


1. The order of validity as shown by the 
twelve comparisons of the five methods is 
first, the index, then biserial r, then D,, 
then Dg, and lastly difficulty. 

. The order of computational time from 
fastest to slowest is first, difficulty, then 
D,, then Dg, then the index, and lastly 
biserial r. 

3. The order of consistency for the three most 
valid indices is first, the index, then bi- 
serial r, then D,. 

4. Introducing the factor of consistency 
lowers the importance of the factor of val- 
idity as a criterion for selecting a valida- 
tion method. 

5. The lack of absolute consistency of meas- 
urement lowers the importance of validity, 
and to some extent raises the importance 
of computational time. 

6. Dg alone and difficulty alone are not worth 
using because of low powers of discrimina- 
tion. 

7. The index is slightly more valid, consid- 
erably more consistent, and four times as 
fast as biserial r. 

8. D, is nearly as valid, nearly as consistent, 
and seven times as fast as biserial r. 


te 





JOURNAL OF EXPERIMENTAL EDUCATION 


to 


[Vol. 7, No. 3 


SELECTED ANNOTATED BIBLIOGRAPHY 


Specific References Mentioned 


. Abelson, Harold H. “The Improvement 


of Intelligence Testing,” Contributions to 
Education, No. 273, New York: Teach- 
ers College, Columbia University, 1927. 


Obtained a consistency correlation with 
McCall’s coefficients of .279 + .099 
using 40 items. 


. Adkins, Dorothy C. “The Role of Sta- 


tistical Selection of Items in Test Con- 
struction,’ Abstract in Psychometrika, 
1937, 2:69. 

“Results to date scarcely justify the 
application of complicated selection 
technique.” 


. Barthelmess, H. M. “The Validity of 


Intelligence Test Elements,” Contribu- 
tions to Education, No. 505, New York: 
Teachers College, Columbia University, 
1931. 

Using ten methods for obtaining in- 
dices, she found consistency correlations 
of .893 down to .548 with 25 items, 98 
pupils vs. 262 pupils “including the 98.” 


. Brigham, C. C. “A Study of Error,” New 


York: College Entrance Examination 
Board, 1932. 

His data gave a consistency value of 
.28 by computing the rank correlation 
between two sets of validity coefficients 
for the same 15 items. 


. Cleeton, Glen U. “Optimum Difficulty of 


Group Test Items,” Journal of Applied 
Psychology, 1926, 10:327-340. 

Found optimum predictive value of 
items among those of middle difficulty. 


. Cook, Walter W. “Measurement of Gen- 


eral Spelling Ability Involving Compar- 
isons Between Techniques,” University 
of Iowa Studies in Education, 1932, 
VI:6. 

Found biserial r the longest to com- 
pute. Suggests that relative stability of 
indices should be considered. 


. Dunlap, Jack W. “Note on Computation 


of Biserial Correlations in Item Evalua- 
tion.” Psychometrika, 1936, 1:2, 51-58. 

By transmuting raw scores to T-scores, 
only the mean of those passing an item 
needs to be computed. 





I! 


12 


13 


14 


15. 





March, 1939] 


8. 


12. 


13. 


14. 


15. 





Dunlap, Jack W. “Nomograph for Com- 
puting Biserial Correlations,” Psycho- 
metrika, 1936, 1:2, 59-60. 

Presents a nomograph which eliminates 
one step in obtaining biserial r. 


. Elveback, Mary L. “Conditions Affecting 


the Differentiating Capacity of Test 
Items,” in “The General College Cur- 
riculum.” Minneapolis, Minnesota, Uni- 
versity of Minnesota Press, 1937. 

Finds that the most discriminating 
items are neither very easy or very hard. 


. Henry, Lorne J. “The Validation of an 


Objective Test.” Unpublished, but re- 
viewed in Bulletin No. 3, Department of 
Educational Research, University of 
Toronto, 1935 (“The Validation of Test 
Items”). 

“Tf only one validity method is to be 
used, that method should be biserial r.”’ 


. Lindquist, E. F., and Cook, Walter W. 


“Experimental Procedure in Test Evalua- 
tion.” Journal of Experimental Educa- 
tion, 1933, 1:163-85. 

Picks Cook’s Index D and biserial r as 
the best, but both are difficult to compute. 


Long, John A. “The Comparative Merits 
of Several Techniques for Determining 
Validities of Test Items.” Abstract in 
Psychological Bulletin, 1934, 31:676. 

Results indicate that methods which 
favor items of 50 per cent difficulty are 
more effective than those that do not. 
Long, John A. “Improved Overlapping 
Methods for Determining Validities of 
Test Items.” Journal of Experimental 
Education, 1934, 2:264-67. 

Mentions that Long Overlapping 
method is less labor than biserial r. 
Long, John A., Sandiford, Peter and 
Others. “The Validation of Test Items.” 
Bulletin No. 3, Department of Educa- 
tional Research, University of Toronto, 
1935- 

States that ease of computation may 
be accepted as a legitimate consideration 
as to which technique to adopt. 

McCall, William A. “Construction of 
Multi-Mental Scale.” Teachers College 
Record, 1925-6, 27:394—-415. 

Biserial r is harder to compute than 

Vincent’s Overlapping. 


16. 


17. 


18. 


19. 


20. 


2I. 


22. 


SELECTION OF TEST ITEMS 227 


Otis, Arthur S. “The Reliability of Spell- 
ing Scales, Involving a Deviation Formula 
for Correlation.” School and Society, 
1916, 4:750-756, 794-796. 

Urges revision of Ayres Spelling Test 
so that difficulty of items will be about 
50 per cent so as to get the best items. 


Richardson, M. W. “The Relation Be- 
tween the Difficulty and the Differential 
Validity of a Test.” Psychometrika, 
1936, 1:2, 69-76. 

Test composed of items of 50 per cent 
difficulty gives higher validity. 


Swineford, Frances. “Biserial r versus 
Pearson r as Measures of Test Validity.” 
Journal of Educational Psychology, 1936, 
27:471-472. 

Author favors use of biserial r. 


Swineford, Frances: “Validity of Test 
Items.” Journal of Educational Psychol- 
ogy, 1936, 27:68—78. 

Author objects to biserial r because of 
complex computations, but admits it 
gives the best measuring of an item. 


Symonds, Percival M. “Choice of Items 
for a Test on the Basis of Difficulty.” 
Journal of Educational Psychology, 1929, 
20:481-493. 

Most accurate measures are items of 50 
per cent difficulty. 


Thorndike, E. L. “The Measurement of 
Intelligence.” New York: Bureau of Pub- 
lications, Teachers College, Columbia 
University, 1927. 

Favors use of biserial r. 


Thurstone, Thelma Gwinn. “The Diffi- 
culty of a Test and Its Diagnostic Value.” 
Journal of Educational Psychology, 1932, 
23°335-343- 

Best range of difficulty for items on a 
test is from 30 per cent to 70 per cent. 


. Uhrbrock, R. S. “Analysis of 4,378 Test 


Items.” Abstract in Psychological Bulle- 
tin, 1936, 33:737- 

Items of from 45 per cent to 85 per 
cent correctness proved the most valid. 





228 


JOURNAL OF EXPERIMENTAL EDUCATION 


24. Vincent, Leona. “A Study of Intelligence 


25. 


Test Elements.”’ Contributions to Educa- 
tion, No. 152, New York: Teachers Col- 
lege, Columbia University, 1924. 

Author concludes her Overlapping 
method was equivalent to biserial r and 
took less time. Concludes there is a rela- 
tionship between goodness of an item and 
its difficulty. 


Voss, Harold A. “An Experimental In- 
vestigation of the Relationship Between 
Difficulty and Reliability of Tests.” Un- 


published master’s thesis, New York: 
Fordham University, 1937. 

Tests of items from 15 per cent to 8; 
per cent difficulty gave the highest reli. 
ability. 


. Wilson, William R., Welch, Gertrude. 


and Gulliksen, H. “An Evaluation oj 
Some Information Questions.” Journal 0; 
Applied Psychology, 1924, 8:206—214. 

Stability of items proved by 17 of best 
20 in 1922 being also among the best 20 
in 1923. 


[Vol. 7, No, ; 





i in Mian ia. ns ne 








THE VALIDITY OF THE MACHINE-SCORABLE COOPERATIVE 
ENGLISH TEST 


ConsTANCE M. McCuLloucH 
Hiram College, and 
JoHN C, FLANAGAN 

Cooperative Test Service 


With the development of large-scale test- 
ing programs, schools and colleges have ex- 
perienced a growing need for more efficient 
methods of scoring examination papers. 
Teachers in large city systems burdened with 
clerical work, persons responsible for the 
speedy scoring of placement examinations, 
and all those who feel that a teacher should 
be something more than a fixture behind a 
moving red pencil have been aware of this 
need for greater efficiency in test scoring de- 
vices. In the construction of the Cooperative 
English Test Form OM,* the Cooperative 
Test Service of the American Council on 
Education has attempted to fill this need. 

Form OM of the English test is a seventy- 
minute, controlled response, multiple choice 
test, in which the student responds to the 
items by marking an answer sheet. The rapid- 
ity with which such an examination may be 
scored is due chiefly to the fact that the re- 
sponse is always a choice which may be indi- 
cated by the mere position of a mark. The 
counting of the correctly placed marks is 
obviously much simpler than the scoring of 
tests in which the scorer must constantly con- 
sider the correctness of unique answers and 
the value of partially correct answers. The 
use of the answer sheet eliminates the turn- 
ing of test booklet pages in scoring. If a scor- 
ing machine is available, the answer sheets 
are scored with even greater dispatch. At the 
conclusion of the examination period the test 
booklets may be put aside for use in another 
testing program whose expense is merely the 
purchase of a new supply of answer sheets. 

Unlike previous forms of the Cooperative 
English tests, the machine-scorable form em- 
ploys the answer sheet technique in its usage 
sections. Like earlier forms, however, it com- 
prises tests of English usage, spelling, and 
vocabulary. The usage part consists of 60 


* Carpenter, M. F., Lindquist, E. F., Cook. W. W.. Pater- 
son, D. G., Beers, F. S., and Spaulding, Geraldine, Coopera- 
— Test, Cooperative Test Service, New York City, 


crucial points of grammar and diction (12 
minutes ), 60 common uses of punctuation (15 
minutes), 30 typical items of capitalization 
(5 minutes), and 15 items on sentence struc- 
ture, each requiring the selection of the best 
of four sentences (8 minutes). The spelling 
part (10 minutes) contains 45 items of four 
words each, of which one or none is mis- 
spelled. There are 100 test words in the 
vocabulary part (20 minutes), for each of 
which the word nearest in meaning is to be 
chosen from five possibilities. 


Since the publication of the new form in 
May 1938, a number of critics have voiced 
more loudly their objections to objective Eng- 
lish tests. It has been said that multiple 
choice items “give away” correct answers; 
that the use of the answer sheet, especially in 
the case of the punctuation section, puts a 
premium upon intelligence and puzzle-solving 
aptitude; that such a test is not a valid index 
of composition ability; and that an essay 
examination is the only appropriate way of 
measuring correctness and power of English 
expression. 


Through the cooperation of seven schools 
in four cities during May and June of 1938, 
data involving 2,000 high school students 
have been gathered which provide evidence 
on these much-debated issues and which es- 
tablish the answer sheet technique as unques- 
tionably adaptable to the field of English. 
Correlations have been obtained, using the 
scores made by large groups of high school 
students, which show substantial relation- 
ships between scores on the new test and 
the following criteria of validity: a minimum 
essentials test of grammar, scores on New 
York Regents’ examinations in third- and 
fourth-year high school English, which are 
three-hour examinations of the essay type, 
and the fifty-minute usage section of the Co- 
operative English Test Form 1937, largely a 
proof-reading test permitting free response 


229 








a 





230 


written in the test booklet. An additional 
study has been made to show the relationship 
between the usage section of the OM form 
and teachers’ estimates of the students’ com- 
position abilities. Further study has consid- 
ered variations in school population and dif- 
ferences among groups of students segregated 
according to grade and ability within a given 
school. It is the purpose of this article briefly 
to set forth these findings. 


FORM 1937 AND Form OM USAGE AND A 
MINtMuM EsSENTIALS TEST OF 
GRAMMAR 


Evidence upon the relationship between a 
minimum essentials test in English grammar 
and Form OM and Form 1937 of the Cooper- 
ative English Test has been obtained in the 
high school of Suburban City A, a wealthy 
community near New York City. The mini- 
mum essentials test in question is a free re- 
sponse grammar hurdle of the type familiar 
to many English teachers. It resembles the 
Cooperative English tests in the inclusion of 
usage items. It is unlike these tests in that 42 
per cent of its content is of a technical nature, 
involving knowledge of grammar rules and 
terms rather than practical mastery of English 
correctness. 


In May 1938 a class of 247 roth grade 
students in Suburban City A was divided into 
two matched groups of approximately equal 
ability in the use of English, according to 
teachers’ estimates of oral and written work. 
The 119 students comprising one group were 
given the 40-minute usage section of the 
Form OM test; 128 students, the 50-minute 
usage section of the Form 1937 test. Both 


groups took the 40-minute minimum essentials 
test. 


A correlation coefficient of .669 reveals the 
definite relationship between the usage section 
of the OM form and the minimum essentials 
test. The correlation of the minimum essen- 
tials test scores with the Form 1937 usage 
scores yields a coefficient of .653. There is 
reason to believe that a separate score for 
usage on the minimum essentials test, exclud- 
ing the technical grammar items, would have 
been more closely related to the usage scores 
on the Cooperative English tests. 


JOURNAL OF EXPERIMENTAL EDUCATION 





| Vol. 7> No. ? 


Tota Scores ON Forms OM AND 1937, AND 
New York REGENTS’ EXAMINATIONS 
IN ENGLISH 


Two hundred and eighty 11th grade stu- 
dents and 230 12th grade students in Sub- 
urban City A were given New York Regents’ 
examinations in English in the June following 
the May administration of the Forms OM 
and 1937 tests. The Regents’ examinations in 
English represent the essay type of examina- 
tion which, in the opinion of those who oppose 
objective testing, is the only valid type of 
measuring instrument in the field of English. 
The two grades, each divided into comparable 
halves according to teachers’ estimates of 
ability to use English with correctness and 
ease, were then examined, half by the Form 
OM test and half by the Form 1937 test. 
Correlations of Regents’ test scores with the 
total English OM scores, representing objec- 
tive measurement of usage, spelling, and 
vocabulary, are .793 in the rith grade and 
.698 in the 12th grade. Regents’ scores and 
the Form 1937 scores are correlated .769 in 
the 11th grade and .695 in the 12th grade. 
The high degree of relationship between the 
essay type examination and the objective type 
is manifest in these indices. Were the essay 
examination more reliable, these correlations 
would probably be higher. 

Correlations of Regents’ scores and the 
Form OM usage part alone are .708 for the 
11th grade and .650 for the 12th, while those 
of Regents’ scores with the Form 1937 usage 
part are .731 for the 11th grade and .580 for 
the 12th. The coefficients for the 11th grade 
are logically higher than those for the 12th, 
since the third year high school Regents’ ex- 
amination in English contains more usage 
items than the fourth year examination, 
which is more concerned with literary criti- 
cism and acquaintance. Had scores on the 
Cooperative Literary Acquaintance and the 
Cooperative Literary Comprehension tests 
been added to the Cooperative English usage 
scores in computing the correlation with the 
Regents’ examination scores, doubtless the 
resulting coefficients would have been higher 
than those presented above. 


Form OM UsacE AND Form 1937 USAGE 


Because of the fact that the Form OM 
usage part is completely objective whereas 
the previous usage parts of the Cooperative 
English tests have been of the proof-reading 





March, 1939] 


and completion type, special interest has cen- 
tered about the relationship between scores 
on Form 1937 and Form OM usage parts. 
Four high schools in two cities have con- 
tributed data for this study. One high school 
represents average ability and socio-economic 
status in Suburban City B near New York 
City. Three high schools, located in a large 
industrial city in the Middle West, represent 
the full scope of abilities and socio-economic 
status in that community. 


In all four schools students were admin- 
istered both the 1937 and the OM forms of 
the English usage test. Alternate students 
took Form OM first so that the practice effect 
of the tests might be equalized. In Suburban 
City B, where 350 oth graders were exam- 
ined, usage scores on the two forms yield a 
coefficient of .842. For 103 roth grade stu- 
dents in the midwestern city a coefficient of 
.848 is obtained; for 140 11th grade students 
a coefficient of .854; and for 112 r2th 
grade students a coefficient of .881. Plainly 
the two types of test are measuring similar 
abilities. 

Of the two types the Form OM has at least 
two advantages. It requires a matter of nine 
seconds in the scoring machine or slightly 
more than a full minute for hand-scoring, 
while the scoring of the Form 1937 requires 
at least ten minutes on the part of a skilled 
reader. The OM form of usage test is to be 
preferred also for the fact that it contains 
thirty more points than the 1937 test and 
that it yields a standard deviation ten points 
greater in the typical high school group. In 
other words, the objective form tests more 
points of usage than the Form 1937 and 
shows greater variation in English usage 
scores in typical groups, according to our test 
results. 


MACHINE-SCORABLE ENGLISH TEST 231 


Form OM anp ABILITy GROUPING 


A suggestion of the extent to which a 
school’s judgment of students’ academic abil- 
ities agrees with the measures of English abil- 
ity given by the Form OM test is shown in a 
study of a New England city high school. 
Students in the school are segregated into 
three curriculum divisions according to their 
marks prior to entrance into the ninth grade. 
Students of academic promise enter a typical 
college preparatory course. Average students 
whose aspirations and abilities do not suggest 
formal education beyond high school are 
given a social arts course of modified content 
and academic standard. Those students who 
are deficient in language skills and scholastic 
ability follow a technical arts course. In so 
far as academic success is determined by mas- 
tery of language, students’ scores on the Eng- 
lish OM test should discriminate among 
these groups. 

In Table I the average scores of 285 roth 
and 11th grade students who took the form 
OM test are considered separately in three 
curriculum divisions. The average scores are 
comparatively low (note total possible 
scores) because of the fact that they represent 
grades ro and 11, whereas the test is suitable 
for college students as well. The differences 
among these mean raw scores are obvious. 


Suburban City B sections its 9th grade 
students into eleven ability groups, in which 
placement is determined on the basis of 
native ability (as measured by a group intelli- 
gence test), English ability, and achievement 
in previous school subjects. This sectioning 
occurs at the beginning of the 9th grade. The 
scores of 350 oth grade students on the OM 
usage test correlate .762 with this ability 
grouping. If it seems surprising that the cor- 
relation is so high, it should be remembered 


TABLE I 


MEAN Raw Scores ON ForM OM For NEW ENGLAND CITY CURRICULUM GROUPS 
GRADES 10 AND 11 


No. of 

Curriculum Group Cases 
College Preparatory ......------------ = 
SS, oe 72 
74 
I 65 oie ces 22 
25 


Total Possible Score _.____-------~- 


Mean Raw Scores 


Grade Usage Spelling Vocabulary 

10 81.0 17.7 36.1 
11 87.0 19.4 38.0 
10 57.6 9.6 17.9 
11 66.2 14.4 24.2 
10 44.3 7.3 13.9 
11 44.4 6.7 23.2 

165.0 45.0 169.0 











~~. 4 a a” 


232 JOURNAL OF EXPERIMENTAL EDUCATION 


that the majority of high school subjects of 
the traditional type require the use of Eng- 
lish skills, and that verbal intelligence is 
measured in the group test. Scores on the 
usage section of Form 1937 for these same 
students produce a coefficient of .798 with 
the ability grouping. Apparently the free- 
response type of test (Form 1937) taken 
without an answer sheet and the objective 
Form OM, which is completely machine- 
scorable and administered with an answer 
sheet, are about equally related to the factors 
which are the basis of sectioning in this 
school. 


UsacGeE Form OM AND VERBAL INTELLIGENCE 


The punctuation section of the Form OM 
usage test has aroused much comment be- 
cause the student is required not only to de- 
cide what punctuation is needed at certain 
numbered points in a passage but to place a 
pencil mark under the appropriate punctua- 
tion marks on the answer sheet. In spite of 
the fact that the administration of the OM 
form is preceded by special exercise on a 
practice answer sheet, some critics fear that 
the specialized nature of the response to the 
punctuation section necessitates unusual in- 
telligence, puzzle-aptitude, and, in the words 
of one facetious commentator, “tweezer 
dexterity”. 

The vocabulary parts of the Cooperative 
English tests have always correlated closely 
with verbal intelligence. Should the vocabu- 
lary part of the test be found to correlate 
more highly with the usage part of the OM 
form than with the usage part of the 1937 
form, the implication would be that the OM 
form required more intelligence. If the vocab- 
ulary part correlated more closely with the 
punctuation section than with the OM usage 
part as a whole, obviously the punctuation 
section would be requiring more intelligence 
than the usage part. 





[Vol. 7, No. 3 


Certain obstacles confront the investigator 
in a study of these relationships. One is that 
no direct comparison between the punctuation 
items of the 1937 and OM forms is possible, 
since the 1937 usage part offers a running 
passage for usage correction with no separate 
consideration of punctuation. Another is that 
matters of punctuation may or may not actu- 
ally require more intelligence than matters of 
grammatical usage, sentence structure, and 
capitalization. A third is that, given two 
measures having the same true relationship 
with a third, the measure involving the more 
items will in general yield a higher correla- 
tion with the third because of its greater dis- 
criminatory power, and the measure requiring 
the greater length of time will produce more 
reliable indices of this relationship. The punc- 
tuation section of the OM form has a time 
limit of 15 minutes, while the entire OM 
usage part is given 40 minutes, and the usage 
part of Form 1937 is a 50 minute test. 

In spite of all these uncertainties the data 
presented in Table II are rather convincing in 
their support of the thesis that the OM form 
of the usage test, including the punctuation 
section, is quite comparable to the 1937 form 
in respect to the factors tested, and appar- 
ently does not discriminate to a greater extent 
against students of poor verbal ability. In 
grades 10, 11, and 12 in the Midwestern city 
the coefficients of correlation between the two 
forms of the usage test range from .85 to .88. 
The usage part of Form 1937 correlates 
slightly more closely with vocabulary scores, 
and the coefficient designating the relationship 
between punctuation scores and vocabulary 
scores is lower than that for the entire OM 
usage part. 

A new practice sheet has been issued by 
the Test Service which deals specifically with 
the type of response called for in the punctua- 
tion section. If an individual student is of 
such low verbal intelligence that the direc- 


TABLE II 


RELATIONSHIPS AMONG USAGE, PUNCTUATION, AND VOCABULARY SCORES IN GRADES 10, 11, AND 
12 IN A MIDWESTERN CiTy, ForMs 19387 AND OM 


Form and Grade 


Form OM Usage 


Se eee ee 
SE ee. 
A a car aia iaeenestiidatiniatidinepmenits 


Form OM Vocabulary 


Coefficients of Correlation 


No. of Form 1937 Form OM 
Cases Usage Usage Punctuation 
103 .848 
140 .853 
112 881 
356 -724 .690 .622 

















March, 1939} 


tions for the punctuation section present espe- 
cial difficulty, the new practice sheet should 
remove this barrier before the actual admin- 
istration of the examination. So far, the Test 
Service has no evidence on the relationship 
between punctuation scores and “tweezer 
dexterity’! 


UsacGE OM AND 1937, AND TEACHERS’ 
ESTIMATES OF COMPOSITION 
ABILITY 


Although marks in free composition are 
notoriously unreliable, they are probably the 
most common and specific measure of expres- 
sion in English. Teachers in the Midwestern 
city high schools whose students took the Co- 
operative tests were asked to indicate each 
student’s composition ability, both oral and 
written, by the letter grade A, B, C, D, or E, 
according to his standing in his grade and 
school. Table III shows the coefficients pro- 
duced by correlation of the usage parts of the 
OM and 1937 forms with the teachers’ 
estimates. 

The fourteen coefficients vary from .165 to 
.772. Nine of them are larger than .500. The 
variation observable in the table is spuriously 
large because of the sizes of the groups meas- 
ured and the paucity of letter-grade classifi- 
cations. Differences of opinion among teach- 
ers as to what factors constitute successful 
composition and in what proportion they con- 
stitute it are another possible reason for the 
vagaries in the coefficients. Teachers differ, 
too, in the amounts and kinds of evidence 
they have of students’ abilities to use English 
with correctness and ease. However, there can 


MACHINE-SCORABLE ENGLISH TEST 233 


be no doubt that some of the factors deter- 
mining the teachers’ estimates are present in 
the Cooperative English tests. While five of 
the seven pairs of coefficients in Table III 
suggest that the Form 1937 usage test is the 
better measure of composition ability, the dif- 
ferences are so slight that the evidence cannot 
be considered conclusive. 


CONCLUSIONS 
The conclusions which may be drawn from 
the foregoing data are as follows: 


1. The usage parts of both the Form 1937 
and the all-objective Form OM of the Co- 
operative English Test yield scores which 
agree fairly substantially with those obtained 
from a roth grade test of minimum essentials 
of grammar. The correlation coefficients are 
respectively .65 and .67 for the two forms. 


2. Scores on the usage parts of these two 
forms of the Cooperative English Test show 
a substantial amount of agreement with the 
third- and fourth-year English examination of 
the New York Board of Regents. The corre- 
lation coefficients range from .58 to .73 and 
are similar for the two forms. 


Combined scores for the usage, spelling, 
and vocabulary parts of the Cooperative Eng- 
lish Test show a somewhat closer relationship 
with scores on the Regents’ examination, the 
coefficients ranging from .70 to .79. It should 
further be noted that the addition of the Lit- 
erary Acquaintance and Literary Comprehen- 
sion parts of the Cooperative English battery 
would doubtless produce a considerable in- 
crease in the agreement of the scores for the 


TABLE III 


RELATIONSHIP BETWEEN TEACHERS’ ESTIMATES OF COMPOSITION ABILITY AND SCORES ON THE 
USAGE ParRTs OF ForMS OM AND 1937 OF THE COOPERATIVE ENGLISH TEST 


Grade and School 


Grade 10 
School H* 
School M __ 


Grade 11 


School M 


Coefficients of Correlation between Teachers’ 


Estimates of Composition Ability and Usage Score 


N Form 0M N Form 1937 

38 165 32 .394 
.519 24 -706 

42 391 43 .862 

34 .678 34 -772 

69 525 63 .554 

41 615 40 425 

34 .539 34 .554 


* H, M, and L stand for high, medium, and low, and refer to the levels of native ability 


and socio-economic status in the schools. 








234 


objective tests with these total scores for the 
essay-type test. 

3. A comparison of scores obtained by stu- 
dents on the all-objective and the proof- 
reading forms of the Cooperative English 
Test indicates that the results from these two 
types of test are in rather close agreement. 
Correlation coefficients range from .84 to .88 
when students’ scores are compared within 
each single grade from the gth to the 12th. 


4. An analysis of the scores of students 
who have been segregated into ability groups 
in two school systems shows a very high de- 
gree of relationship between the combined 
criteria which have been used for sectioning 
purposes and scores on the Cooperative 
English Test. 


5. Analysis of the results for the vocabu- 
lary and usage parts of the Cooperative Eng- 
lish Test lends no support to the opinion that 
the all-objective, machine-scorable test puts 
a greater premium upon verbal intelligence 
than does the usual test situation. 


JOURNAL OF EXPERIMENTAL EDUCATION 





(Vol. 7, No. 3 


6. High school teachers’ estimates of their 
students’ powers of oral and written expres- 
sion bear a varying but significantly positive 
relationship to success on the usage parts of 
the Cooperative English Test. The median of 
fourteen coefficients of correlation between 
these variables is .53. The two forms of the 
usage test show a similar degree of relation- 
ship to these estimates, with the Form 1937 
test correlations slightly but not significantly 
or consistently higher. 


If there were in existence or in immediate 
prospect an essay examination highly reliable 
not only in its scoring but in itself, it is 
doubtful that even the most ardent propo- 
nents of objective testing would hesitate to 
declare such a direct measure of English ex- 
pression superior to a more indirect measure. 
But at the present time, the objective form of 
examination is the only means by which one 
may discern the extent of English skills and 
powers with speed, with uniformity, and with 
the personal equation of the examiner oper- 
ating at a minimum. 





A STUDY OF CERTAIN FACTORS INFLUENCING ACADEMIC 
ACHIEVEMENT WITH SPECIAL REFERENCE 
TO THE HEALTH FACTOR 


LoweELt N. DoucGLas 
Baylor University 


THE PROBLEM 


The problem concerns a study of certain 
factors thought to condition achievement in 
English at Baylor University with special 
reference to the health factor. 


PREVIOUS STUDIES 


A survey of literature reveals that for some 
time there has existed the belief that there is 
some direct relationship between mental and 
physical functioning. The general situation is 
presented in Tables I and II. 


TABLE I 


SUMMARY OF PREVIOUS INVESTIGATIONS OF 
PHYSICAL AND MENTAL RELATIONSHIPS 


No Positive Negative 
Rela- Rela- Rela- 


Studies tionship tionship tionship 

Before 1900 _____ 4 6 0 

1900-1910 —_____ 4 8 2 

1910-1920 -__-_-- 2 20 2 

1920-1930 _____- 4 18 0 

1930-Present _.. 0 6 1 
Teel ...+.... 14 58 5 

TABLE II 


SUMMARY OF RELATED HEALTH FACTORS AND 
MENTAL ABILITY 


No Positive Negative 
Rela- Rela- Rela- 


Studies tionship tionship tionship 
Nutrition ~..._-- 3 3 1 
Tonsils and Ade- 

eae aS 3 0 
Glandular 

Therapy ---... 1 0 0 
Intestinal Toxema 0 1 0 


The studies referred to in Tables I and II 
employed various methods in attempting to 
measure the health factor. Anthropometric 
measures were first used: weight and height 
were used individually, then in combination; 


soon sitting height was added as a separate 
measure. On the basis of a great number of 
such measures, standards corresponding to 
age levels were worked out. Then came the 
idea of measuring vital capacity, breathing 
capacity or chest circumference, then expan- 
sion. This measure was usually added to the 
weight/height index. Vierordt’s formula for 
determining weight in connection with body 
length and chest circumference endeavored to 
present this combination mathematically. 
Some investigators believed that strength was 
a factor in physical well-being, and intro- 
duced the measures of grip, generally used 
with other measures. Naccarati evolved the 
morphologic index which interprets the body 
in terms of the length of the extremities and 
the volumetric value of the trunk. Students 
of anatomy introduced ossification ratios for 
use primarily in the measurement of children. 
Crampton, in studying boys, declared that 
the age of pubescence affected physical and 
mental health. Still other individuals felt 
that the mere listing of the number of physi- 
cal defects found in an individual would yield 
his health score. Several studies were con- 
ducted on this basis. Dr. Beyers of the 
United States Navy felt the inadequacy of 
previous attempts at measuring health and 
employed in addition to the weight/height 
index, Vierordt’s formula, etc. a physical con- 
dition scale. Perfect health was rated at 
100%, and the scale consisted of six sections. 
On this he, as examining physician, rated the 
individuals under observation. The measure 
was subjective, it is true; but it was the sub- 
jective judgment of authority based on thor- 
ough physical examination. This method he 
employed in 1900. Since that time many ad- 
ditions have been made to physical examina- 
tion procedure, making it more reliable and 
more objective. Such is the summary of the 
measures employed in the previous studies of 
health. 


235 





ee Oe ee 


236 


THE PROCEDURE EMPLOYED IN THIS 
INVESTIGATION 


In the present study 109 freshman boys at 
Baylor University were enrolled in three Eng- 
lish classes under the direction of one teacher. 
The time consisted of daily classes of one 
hour five days a week for twelve weeks. Five 
measures were used in estimating achieve- 
ment in English for each of the participants: 
(1) teacher’s marks, (2) departmental test 
averages, (3) the scores on the Purdue Place- 
ment Test in English, Form B, administered 
at the end of the term, (4) the raw gain score 
as shown by the difference between the scores 
on Purdue, Form A, administered at the be- 
ginning of the term, and, Form B, admin- 
istered at the conclusion of the term, and 
(5) the percentage of gain as disclosed by 
Form A and Form B of the Purdue Test. 

Other factors considered in the study were 
initial English ability, measured by the Amer- 
ican Council on Education Cooperative Eng- 
lish Test, 1937, the general high school aver- 
age, the high school English average, the 
average daily study time, study habits as ana- 
lyzed by the Wrenn Study Habits Inventory, 
intelligence as measured by the American 
Council on Education Psychological Examina- 
tion for College Freshmen, 1937 Edition, 
reading comprehension as determined by the 
Iowa Silent Reading Test: Advanced, Form A 
(Revised), rate of silent reading as measured 
by the same test, leadership as indicated by 
the Morris Trait L by Elizabeth Morris, per- 
sonality as rated by the Bernreuter Person- 
ality Inventory, socio-economic status as de- 
termined by the Sims Score Card for Socio- 
Economic Status, social adjustment as esti- 
mated by the Washburne Social Adjustment 
Inventory, and chronological age. 

For the measurement of the health factor, 
health was considered from the physiological 


JOURNAL OF EXPERIMENTAL EDUCATION 





(Vol. 7, No. 3 


viewpoint, the comprehensive medical exam- 
ination being employed. Much consideration 
was given to the determination of the com- 
prehensive physical examination form. Forms 
used in numerous university and health serv- 
ice divisions, hospital and clinic forms, forms 
used by individual physicians, and those used 
by major insurance companies were carefully 
studied. Then from a study of standard text- 
books in physical diagnosis and health exam- 
inations, the items for the health examination 
were determined, as well as the tests sug- 
gested for measuring the items listed. The 
form! thus constructed was then given author- 
itative approval by a board of seven practic- 
ing physicians. The university physician for 
men administered all tog of the examinations 
in order to have a uniform evaluation of each 
item. He was assisted in the non-technical 
parts by the university nurses. Students show- 
ing definite tendencies toward defects were 
fluoroscoped or x-rayed. The basal meta- 
bolism test necessitated a second appointment 
with each student. 

In addition to the physical examination 
administered to each participant in the study 
at the first of the term, a weekly health exam- 
ination was made. Because of fluctuation in 
some health measures such as temperature, 
pulse rate, and weight due to individual habits 
and the time of day of recordings, the same 
day each week and the same hour for the 
weekly appointment was scheduled for each 
student. The university physician admin- 
istered the tests and recorded the health his- 
tory of the individual for the week together 
with his personal observations. Twelve 
weekly recordings were made according to 
the following weekly health examination 


form: 


2The physical examination form used in the study was 
not included in this review because of lack of space. Those 
desiring a copy please request it of the author. 


BNNs dics ndccclah hatischdaitentedonnaietinaiiad dda I i scciicictaiciinsclisiiendiiieemaanienaientiite | 
First Week 

ee Ee idence Time of Examination______..-__--_ Throat 

Temperature______-_-~- peeee Feemare ............. _ SES ES 

Se After exercise__........._- [| ES _ Sse 


Remarks: 


Have you suffered from any of the following during the week? 


Colds 
Headaches rears 
Constipation 


Thoracic pains , 7 


Abdominal! pains 
Eyes 


Slight 








March, 1939} 


EVALUATION OF STUDENT HEALTH ON THE 
BASIS OF THE COMPREHENSIVE PHYSICAL 
EXAMINATION AND THE WEEKLY PHYSICAL 
EXAMINATION 


At the present time there is no standard 
method of scoring a comprehensive physical 
examination, even though several attempts 
have been made to score such examinations. 
One method employed in several studies is to 
score the individual according to the number 
of physical defects revealed by the examina- 
tion. The unsoundness of this method is at 
once apparent because the nature of the de- 
fect is of more significance than the number 
of defects. Another method employed, prima- 
rily by insurance companies, is to rate the 
individual’s health by (1) the number of im- 
pairments, (2) impairments present, (3) im- 
pairments demanding medical treatment, and 
(4) impairments demanding immediate med- 
ical treatment and care. Varying interpreta- 
tions of the four levels and the fact that the 
impairments are concerned with longevity 
rather than immediate health status and 
health function make the plan impractical for 
use in the present study. Some theorists have 
suggested that various values be assigned to 
the systems of the human organism and the 
individual’s health be evaluated by taking the 
sum of these systemic evaluations. Since the 
body systems are so highly integrated and 
correlated that separate functioning is impos- 
sible to ascertain, this measure seems un- 
sound. In spite of the absence of a standard 
method of scoring physical examinations, all 
physicians make such evaluations when they 
say that a person is in “good health” or “poor 
health.” 

In this particular study the classification of 
the health form was made by an appeal to 
authoritative opinion. First, an assumption 
was made that there are at least six distin- 
guishable classifications of individual health 
status: (1) very good, (2) good, (3) mod- 
erate, (4) poor, (5) bad, and (6) very bad. 
Such classifications we have identified by the 
letters A, B, C, D, E, and F. In this manner 
we have avoided the giving of dubious numer- 
ical values. Each member of a group of five 
physicians was given the comprehensive 
physical examination form and asked to indi- 
cate just what conditions should be present 
to indicate ratings of A, B, C, D, E, and F. 
After an opportunity for individual study, the 
physicians met as a group and discussed the 


ACHIEVEMENT AND HEALTH 


237 


independent classifications, listing the func- 
tionings and defects which would place an in- 
dividual in the respective classes. Classifica- 
tion of the comprehensive physical examina- 
tion of each student participating in the study 
was then made according to the rating scale 
determined by this authoritative judgment. 
Table III shows the distribution of the stu- 
dents according to the comprehensive physical 
examination rating: 


TABLE III 


DISTRIBUTION OF STUDENTS ACCORDING TO THE 
PHYSICAL EXAMINATION RATING 


Number of 
Class Students 

PETES Rae oe NOR ne Eten eS 8 
LR Sa SI I 30 
| ERR a ee ae 40 
ga SESS ee 21 
a a ea 5 
ap PL IE, Sen sae 2 Rene fen 1 

FN tater ceaceilchhdeenereAthaiae dil aeiceaeal 105 


CRITERION FOR CLASSIFYING SUBJECTS 
ACCORDING TO THE PHYSICAL 
EXAMINATION FoRM 

Class A 

The normal cardiac measurements set up 
in Class A were taken from a recognized 
standard text book of medicine. The heart 
measures were ascertained by percussion and 
palpation. In any case of doubt as to the size 
of the heart, fluoroscopy was carried out for 
confirmation. Normal heart sounds were nec- 
essary for this classification; all functional 
murmurs were not included. The normal pulse 
and response to examination were in accord- 
ance with accepted standards. Normal chest 
findings were included; all doubtful cases 
were fluoroscoped or x-rayed. No case pre- 
senting any weakness in the inguinal canal 
was included in this class. Both eyes were re- 
quired to be 20/20 as tested by the Snellen 
chart. Those cases presenting color blindness 
were not included in this group. The internal 
and external examination of the nose was re- 
quired to be normal. Those cases which had 
tonsils were not included in this group. Hear- 
ing in both ears was regarded as normal if 
conversational tones were heard at 50 feet. 
The weight and height were important factors 
in this class; those cases presenting wide vari- 
ations were not included. The nervous and 
osseous systems were regarded as normal if no 








238 


defect was found. The urine was required to 
be free of albumen and sugar, and negative 
as to the microscopic examination. 


Class B 

The main difference between Class A and 
Class B does not rest in the cardiac or lung 
findings. In fact, a very little difference be- 
tween these groups as far as health findings 
were concerned exists. This group was re- 
quired to have normal heart and chest find- 
ings. Any hernia excluded the patient from 
the group. Eyes which showed 20/20 vision 
in both eyes with slight correction were in- 
cluded in this class. Slight septal deviation in 
the nose was included in the class. History of 
occasional colds in patients placed them in the 
B class. 


Class C 


The heart and lung findings were required 
to be normal as set up in Class A. Hernia 
which had been repaired for at least eighteen 
months was included in this group. The cases 
with vision of 20/20, with correction of a 
mild hyperopia or myopia, were placed in this 
group. Any septal deviation with evidence of 
accessory sinus infections or polypi placed the 
patient in this group. The presence of fre- 
quent colds in the winter was taken into con- 
sideration in this class. The presence of ton- 
sils and adenoids with evidence of gross in- 
fection was sufficient to place the patient in 
the group. Slightly underweight individuals 
were placed in this group. History of rheu- 
matic fever in childhood was regarded as 
sufficient evidence for this class. The main 
differences in this class and Class B rest in 
the mild refractive errors of the eyes, the 
history of frequent colds with evidence of 
accessory sinus disease, the presence of in- 
fected tonsils, and the history of rheumatic 
fever or childhood tuberculosis. 


Class D 


Blood pressure over 130 mm (systolic) 
after repeated examination, with normal car- 
diac measurements, was included in this class. 
Any history of chronic bronchitis was suffi- 
cient evidence for inclusion in this group. 
Uncorrected hernia were rated Class D. Eyes 
which presented less than 20/20 with correc- 
tion were included in this class. Congenital 
cataract, loss of one eye, corneal opacities, 
and progressive myopia were classed in this 


JOURNAL OF EXPERIMENTAL EDUCATION 

























(Vol. 7, No. 3 


group. The finding of marked septal devia- 
tion with loss of areation and with polyp) 
present was sufficient for placement here 
Previous ear infections with subsequent 
mastoid infections were included. Repeated 
respiratory infections were also grouped in 
this class. 


Class E 

A difference between Class D and Class E 
lay in the blood pressure findings. Any blood 
pressure over 140 mm (systolic) was included 
in this group. Slight increase in heart meas- 
urements with the presence of organic heart 
murmurs was evidence for inclusion here. 
Evidence of arrested tuberculosis or asthma 
placed the patient in this class. The history 
of peptic ulcer, with or without physical find- 
ings, placed the patient in this group. Uncor- 
rected and scroto] herniae were placed here. 
Marked deafness in one or both ears was evi- 
dence for inclusion here. High grade defective 
visions were placed here. Mild renal diseases 
were placed in this group, as were definite 
changes in the B. M. R. findings. 


Class F 

This group carried all the high-grade de- 
fects found in the participants in the study. 
Any hyperpiesia over 150 mm was included 
here. Mild decompensating heart disease was 
placed here. Active tuberculosis was sufficient 
for inclusion here. Diabetes mellitus, cardio- 
vascular, renal disease, and severe nervous 
disorders were placed in this class. 

In a similar manner various functions and 
defects were listed which would place an in- 
dividual in the various classes according to 
the weekly physical examination. Classifica- 
tions of the twelve weekly physical examina- 
tions for each student reveal the following 
distribution: 


TABLE IV 


DISTRIBUTION OF STUDENTS ACCORDING TO 
WEEKLY PHYSICAL EXAMINATION RATING 


Number of 
Students 








March, 1939] 


CRITERION FOR CLASSIFYING SUBJECTS ON 
THE Basis OF WEEKLY PHYSICAL 


EXAMINATIONS 
Class A 

:. Normal temperature throughout the 
experiment 

2. Normal throat throughout the experi- 
ment 

3. Normal blood pressure throughout the 
experiment 


4. Normal pulse and response throughout 
the experiment 

. Maintenance of normal weight or in- 
crease in weight 

6. No illness of any sort 


Ww 


Class B 
1. Normal temperature 
experiment 

2. Normal throat throughout the experi- 
ment 

3. Normal blood pressure throughout the 
experiment 

. Normal pulse and response 

. Maintenance of normal weight 

History of one cold during the experi- 

ment, with recovery in 2 weeks; occa- 

sional headache 


throughout the 


own 


Class C 
1. Normal temperature throughout the 
experiment 


2. Occasional simple naso-pharngitis with 
recovery within 1 week 

3. Normal blood pressure 

4. Pulse normal or very slightly elevated 

5. Maintenance of normal weight 

6. History of 2 or more colds during the 
experiment with recovery in each in- 
stance within two weeks 

7. Vision disturbances 


Class D 

1. Normal temperature during the experi- 
ment 

2. Normal blood pressure 

3. Frequent attacks of naso-pharyngitis 
during experiment 

4. Pulse elevated above go consistently 

5. Abnormal loss of weight or abnormal 
gain of weight 

6. History of frequent colds with delayed 
recovery or other minor illnesses 


ACHIEVEMENT AND HEALTH 


Class E 
1. Variation in blood pressure with in- 
crease in both systolic and diastolic of 
IO points 
2. Temperature elevated with explainable 
cause 
. Frequent attacks of naso-pharyngitis; 
chronic cough, or hoarseness 
. Abnormal loss or gain of weight 
. Pulse consistently elevated over 100 
. History of constant colds, accompanying 
sinusitis 
. Confinement in bed 
lasting over 10 days 
Class F 
1. Variation in blood pressure with in- 
crease in both systolic and diastolic 
above 150 
2. Temperature consistently elevated 
. Frequent attacks of naso-pharyngitis 
with chronic cough; hemoptysis 
. Major illness during experiment; con- 
finement in bed for remainder of term 
. Pulse consistently over 100 
. Constant colds with severe sinusitis 
Great loss or gain of weight 


w 


uw & 


with illness not 


~~ 


4s w 


Ou 


~ 


STATISTICAL ANALYSIS OF DATA 


In order to test the potency of each of the 
several factors which may or may not be 
present in the kind of learning situation 
studied as measured by the five criteria of 
learning success, the simple correlation be- 
tween each of the twenty factors and each of 
the criteria was calculated. The resultant 
coefficients of correlation together with their 
probable errors and the number of paired 
observations on which each coefficient is 
based are listed in Table V. 


INTERPRETATION OF RESULTS 


Within each row of Table V that factor 
having the highest coefficient of correlation 
with the corresponding criterion has the 
highest predictive value of the factors studied 
when taken singly. The following observa- 
tions may be drawn from the data: 

1. When the consistency of the data is 
considered, we observe that three of the 
criteria—teacher’s marks, departmental test 
averages, and the Purdue Test Form B—are 
highly comparable measures. 





240 


2. When the percentage of gain in achieve- 
ment is used as the achievement criterion, 
we find considerable change in the size of 
the coefficient. This change is probably due 
to the fact that certain known factors in 
achievement such as_ intelligence, initial 
ability, high school grades, are partially con- 
trolled and place the students on a more 
nearly equal basis. 


3. The table of coefficients of correlation 
seems to indicate that the raw score difference 
between pre-test and final test is a poor meas- 
ure of achievement. The fact that coefficients 
are almost wholly in reverse relationship is 
due to the terrific handicap placed on the 
brighter students who could not possibly 
make the gain that is possible for the poorer 
students to make. It is difficult to improve at 
a rapid rate when one starts very high upon 
the learning curve. 


4. The factors having the greatest pre- 
dictive value for achievement in Freshman 
English as measured by teacher’s marks, de- 
partmental test averages, and the Purdue 
Test Form B are initial ability in English, 
intelligence, reading comprehension and high 
school records, both general and English 
averages. We might say that these factors are 
similar and closely related if the consistency 
of the data is considered. 


5. Both measures of health show con- 
sistently high coefficients of correlation with 
all of the criteria. It is of particular sig- 
nificance that weekly health status is a very 
important factor in English achievement 
when measured by departmental test aver- 
ages, and of almost equal significance is the 
health factor measured by the comprehensive 
physical examination when correlated with 
percentage of gain in achievement. 


6. Using r= .50, either health measure 
alone give a 9 ==1—k = .13, which means 
that by using a prediction equation involving 
the health measure, we can predict achieve- 
ment 13% better than by chance selection. 
Such factors as intelligence, initial ability in 
English, high school average, and reading 
comprehension correlate with the criteria with 
an r of approximately .75 to give 9=-1—k 
== .34. Thus the health measures are about 
one-third as efficient in predicting achieve- 
ment as the traditional and commonly 
accepted “best” measures. 


JOURNAL OF EXPERIMENTAL EDUCATION 





[Vol. 7, No. ; 


7. Relative to criterion D (percentage oj 
gain in achievement), the health factor 
(physicial examination) is just as efficient 
(barring chronologcial age) as the best of 
traditional measures. 

8. The two health measures and chrono- 
logical age are the only three measures that 
are significantly associated with raw score 
gain. 

g. Intelligence, high school average in 
English, and general high school average 
seem to have a similar high predictive value 
when the criteria A, B, and C are used, and 
a slightly less significant value when D is 
used, but when E is used the predictive is 
negligible. 

10. Initial ability in English as measured 
by the Cooperative English Test has an 
extremely high predictive value when criteria 
A and B are used, and the highest of all the 
factors when criterion C is employed. It also 
correlates significantly with criterion D, but 
is of no significance when raw score gain is 
used. 

11. Reading comprehension has an impor- 
tant predictive value when criteria A, B, C 
are used as achievement measures; it has a 
positive value when percentage of gain is 
used, but is of no significance when raw score 
gain is used. 

12. Study Habits and reading rate have 
a positive predictive value except when raw 
score gain is used as the criterion, the most 
significant predictive relationship being when 
Purdue Form B is used as the measure of 
achievement. 

13. Average study time as reported by the 
students has no significance with any criterion 
used. 

14. Apparently, the older pupils tend to 
make larger raw score and percentage gains 
but receive lower final test scores, depart- 
mental test averages, and teacher’s marks 
than do the younger pupils. 

15. Leadership as measured by the Morris 
Trait L has a positive predictive value with 
all the criteria except raw score gain. 

16. Socio-economic status seems to have 
no predictive value for achievement in Eng- 
lish as measured by the five criteria probably 
because of the homogeneity of the group. 

17. Social adjustment seems to have no 
predictive value when correlated with the five 
criteria. 





241 


ACHIEVEMENT AND HEALTH 


March, 1939] 





"(98% ‘d vag) Asoquaauy Ayyeuosiag Jayneiuleg ‘AX 


‘(98% ‘d vag) ArojuaauT yusuNsn{(py [BI90g eurnqysemM “ATX 
*(9¢z ‘d 2a9) preg 21099 DIWIOUOIT-O190G SUIS “[]]TX 
*(9g% “d vag) "T Well SM40W “TTX 
‘(96% “d vag) aBy [er1Zojouo1yD “TX 
‘(98% ‘d vag) oun, Apnyg ed¥isay “xX 





‘(98% “d aag) “Y Woy 2489], SuIpesy yuelig BAO] “TTA 


‘(98% ‘d vag) eBvr0eay ysisuq jooyes 4H 


‘IA 


"(98% ‘d veg) aB¥iraae jooyos YysIYy [B1sUer) “A 


"(98% ‘d eag) yay yssu 


aAtpBiadoog “Al 


‘(9¢@ ‘d cag) UoKeurUTEXg [oZojoyoAsg UvIEUTY “[]] 
‘98% ‘d oes Sarnsveul YR[eeY OMZ yy JO UOISSNdSIP B 10g 4 


‘9¢z “d 


G98 YSI[ZuUY Ul JUBULBASIYOE JO SOANSBOUL JO UOISSNISIP B 10g , 


FOI—N 680° ¢9° SANNSVaW HLIVAH HLOG NaGMLaG NOLLYTaNN0D 





‘(9gz ‘d vag) ArojquaAUT sziqey Apnyg uUaTM “XI 
‘(983 ‘d eag) “y Uo 1480], Surpeoy yueis BMO] “TITA 
A aiavyL 
ZOI-—-N Z01—N GOIL—N 20I—N Z01-—-N Z0I-—N ZOI—N 
99° * 31° L90° * 20° — 10°- 490° + S0°— L9°* 180'—  190°*80'— 190°*Z0°— 
Z0T—N Z0I1—N 201-—-N ZOI—N Z0I—N Z0I—N TOI—N 
90° * 61° L90° * 90°— tI'— 190° * 60° 390° * La" 190° = $0° 190° * 90° 
10I—-N TOI-—-N 1OI—-N 10I—N TOI—N TOI-—N TOI—N 
890° * LE" 490° * 20° Ool'— + 990°*LT° sso * Sh" 190° = L0° 990°—€I° 
10I-—-N 10I—N 10I--N 10I—N 10I—N 1OI—N TOI—N 
$90°* LI" 90° * 60° 80°- 990° * ST" +90 * 22" 490° + L0° 991° * 110° 
S0I—N 80I—N S0I-—-N 80I—N 80I—N 801-—N SOI—N 
£90° + 2° 190° * $0" wO'— 990° * 81° 190° + T° 190° * $0° $90° + ST° 
uorssImgns "AOIVXY Aouapy Aguepuay, quew 
Aqpiqeiaog aouapyuo,y aouBurMo(d *s9A01}UT -JNS-}12S aNomNeN 4snipy 
8-2 Orla a-+8 I-ea $-2a N-li [BPpos 
(380,], 197nNe1U10gq ) AV euosleg 
AX AIX 
10I—-N 8L —N 001—-N 00I—N 00T ——-N 00T—N oe. 
990° + IT” $L0° * £0° 190° * $0" 190° * L0°- L90° * 00° 190° * 20° 0L0° * 90° 
10I—N 08 —N 66 —N 66 —N 001I-—N 001—N 06 —N 
L9°=80'— 690° * 82° 890° * $f" 190° * 6h" 990° * 99° 840° * ts" SO * LS 
10I—N 08 —N 66 —N 66 —N 00I—N 00I—N 26 —N 
990° IIl°— 890° +8?" rh0' + 8¢° L10° * 98° 820° + I8° 820° * 9L° 010° = 26° 
10I—N 08 —N 66 —N 66 —N 001—N 00I—N 26 -——-N 
L90° * 60° - 990° * FE" 890° * Ss" L80°* L9° 920° * 6L° 280° * ZL" 620° * SL" 
SOI—N 08 —N Z01—N Z0I—N 80I—N 80I—N heme | 
190° * 20° £90° * OF" L90° * 68° 280° * L° 20° = 08° 920° * BL" €80° * ZL" 
owl, uotsuey ystjsuq saBBlIAYy ys3uq 
Apnys S1q8H o78y ~eidwi0) uy ‘S°*H uy AinIqy 
e3t0AYy Apnag Bulpwey Sulpeey ‘eAV'S'H Jeseues) erruy 
x xI WIA TIA IA A Al 


SuOUN AIWVAOUd GNV SASVD 4O YsAWAN HLIM 


A aTavL 


ZII—N 
190° * 90° 
10I—N 
990°* $1" 
10I—N 
990° * 91° 
10I—N 
190° * 90° 
e0I-—-N 
990° * ST" 
snqwIS 


a1wOU0Iy 
-0100§ 


IIx 


00I—N 
190° ¥ 20° — 


66 —N 
190° * BF" 
001T—N 
120° * €8° 
66 —N 
80° * 69° 


Z201—N 
180° * 8L° 


aoue8 
-1yeauy 
Ill 


00I—N 
190° * 80° 
00I—N 
090° * 28° 
TOI—N 
90° + OF" 
001-—N 
890° * 98° 


€01I—N 
090° * 18° 


drys 
~sapey] 


ix 


00I—N 
90° * OF" 


001-—N 
8S0' + SP" 


1OI--N 
90° * 9F° 


001I-——N 
080° * FL" 


66 —N 
940° * 99° 


ps090y 
AT490 M 
we 


00I-—-N 
ZrO’ + 19° 


10T—N 
920° = 8L° 
TOI—N 
090° + 28° — 


TOT-—N 
290° * 83° — 


vOI-—N 
090° = §&°— 


aay 
[wo1d0| 
~ouol4y 


IX 


001—-N 
rS0° + oP 


00T—N 
80° = 09° 
001-—N 
890° + LY’ 
00I—N 
690° * Zo" 
00I—N 
'S0° = PP 
“wexg 


shud 
— 4 





a Fy suniog 

eNpsNg UlBs) B109g BBY 

‘a 

a Fy sui0g 

anping ulet) a3ejue010eg 

‘d 

q enpang 

UO $8109 

x?) 

“PAY 189], 

jejuewysedad 

‘da 

syuTW 

8,104089,L 

‘y 
ysijsug ut 
JUeUIAAJIYOy 
jo seunste Ww 


a Fy su0g 

enping Ulery a09g MEY 
‘a 

a BP Vy suu0y 

anping ule a3equacleg 


qd 
a enping 
uo $2109¢ 


‘Oo 
‘OAV 389] 
[e}ueuzedeg 
‘a 
SIV 8, s0498e, 
‘Vv 

ystsug ut 


queweAatyoy 
JO 8aINSBA Wy 


NOLLV14aHO*F) dO SLNAIDI4AIOD ONIMOHS ATAV YL, WaALsSVA 





242 


18. With the exception of B2-S (Self- 
Sufficiency) and F2-S (Sociability), none of 
the Bernreuter traits show any significant 
relationship with the five criteria. 

The relative efficiency of the factors taken 
singly in predicting achievement in English 
as measured by the five criteria is shown in 
Table VI. The Purdue Test Form B, teacher's 
marks, and the departmental test averages are 
considered comparable measures of achieve- 
ment in English as shown by the consistency 
of the data, and the factors correlated with 
these data place themselves in the order in- 
dicated in Column I when average rank with 
the three comparable criteria is computed. 
When the percentage of gain is used as the 
criterion of achievement in English, the 
factors rank themselves in the order indicated 
in Column II; Column III indicated the rank 
of the factors when raw score gain is the 
criterion. 

The health relationship with high school 
English average, general high school average, 
and intelligence is sufficiently high to say 
that the correlation trend is positive in each 
case. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 7, No. ; 


PARTIAL CORRELATIONS 


1. Achievement in English as measured by 
the Purdue Test Form B with health when 
intelligence is held constant — r= .25. 

2. Achievement as measured by the Purdue 
Test Form B with intelligence when health 
is held constant — r == .79. 

Thus we see, as the partial coefficients of 
correlation indicate, that intelligence as 
measured by the American Council Psycho- 
logical Test is a more important predictive 
factor than health, but it is also obvious that 
health is a positive factor. 

3. Achievement (as measured by the per- 
centage of gain between Purdue Test Form A 
and Purdue Test Form B) with health when 
intelligence is held constant — r = .50. 

4. Achievement (as measured by the per- 
centage of gain between Purdue Test Form A 
and Purdue Test Form B) with intelligence 
wher health is held constant — r = .32. 

Again we see from the partial coefficients 
that health becomes even a more important 
factor than intelligence when the percentage 
of gain is used as the criterion for achieve- 
ment. 


Column I 


H. S. Eng. Ave. 

Initial Eng. A. 

. *Intelligence 

. *Gen. H. S. Ave. 
Reading Compreh. 

Weekly Health Ss. 

Health (Phys. Ex.) 

Reading Rate 

Study Habits 

10. Leadership 

11. Self-Suffic. 

12. Sociability 

13. Introver.-Extro. 

14. Socio-Econ. S. 

15. Social Adjust. 

16. *Neurotic Tend. 

17. *Confidence 

18. Dominance 

19. Ave. Study 

20. Chron. Age 


O90 ID Oe oo tO 


TABLE VI 


Column II 


Chron. Age 
Health (Phys. Ex.) 
Initial Eng. A. 
H. E. Eng. Ave. 
Gen. H. S. Ave. 
Reading Compreh. 
Intelligence 
Weekly Health S. 
Reading Rate 
Leadershi 

Study Habits 
Self-Suffic. 
Sociability 
Socio-Econ. S. 
Introver.-Extro. 
*Neurotic Tend. 
*Social Adjust. 
Ave. Study Time 
Confidence 
Dominance 


Column III 


Chron. Age 

Weekly Health Status 
Health (Phys. Ex.) 
Ave. Study Time 
Leadership 
Socio-Economic S. 
Study Habits 

Gen. H. S. Ave. 

H. E. Eng. Ave. 
Dominance-Submis. 
Social Adjust. 
*Intelligence 
*Confidence 
*Self-Suffic. 
Introver.-Extro. 
Reading Rate 
Initial Eng. Abil. 
Reading Comprehen. 
Neurotic Tend. 
Sociability 


* A few intercorrelations were computed in order to determine more significant results. 


Measures r P.E. 
Intelligence with Health (Phys. Exam.) .....................___._---_- 42 + .056 
Intelligence with Chronological Age ...............___._..____________ 22 + .064 
Intelligence with High School Eng. Ave. __._._.--__________-______ 85 + .017 102 
Health (Pays. Exam.) with Chren. Age ............... ......4-------- 47 = .057 
Health (Phys. Exam.) with H. S. Eng. Ave. ..............-...___---- 50 + .05 
Bt SNES OS i ae eee 59 + .043 
eee, is. Ds SW SE Bs ES I ve reecndcicrcionetndninnbasuae 88 + .016 








March, 1939] 


If we turn to the high school average in 
English as a measure of academic achieve- 
ment in English, we find the following 
significant partial coefficients of correlation: 

5. Health and high school English average 
with intelligence held constant — r == .3. 

6. High school English average and intel- 
ligence with health held constant — r == .9. 

Again we see that health is a positive 
factor in achievement as the size of its 
coefficient indicates a positive correlation 
trend. 


CONCLUSIONS 


The author is fully cognizant of the several 
limitations present in this study, such as 
small sample, differences in opinion as to the 
real meaning of the words health and achieve- 
ment, the health measures, and the limitation 
present when the correlation technique is 
used; but in spite of such limitations there 
seem to be several significant findings. 

1. Any attempt to measure academic 
achievement should come only after a careful 
study of the meaning of the word since there 
are so many possible criteria. Five such 
criteria were used in the present study, and 
there are three outstanding differences evi- 
denced when the consistency of the data is 
observed: status, percent of improvement, 
and raw score improvement. 

2. When academic achievement in English 
is considered from the “status’’ viewpoint, 
the factors of intelligence, initial ability in 
English, general high school average, high 
school English average, and reading compre- 
hension offer the best predictive indices, with 
initial ability in English as the best index. 

3. Since college freshmen usually represent 
a rather homogeneous group, social and 
economic factors in achievement, as measured 
by the tests employed in this study, are of 
little predictive value. 

4. The coefficients of correlation between 
the two health measures and the five criteria 
of achievement in English are consistently 
higher than the correlations reported in 
similar studies, probably because of the 
comprehensive health measures employed 
here. 

5. When an individual’s health is studied 
from the beginning to the end of a school 
term, his health status during that period is 
significantly correlated with his achievement. 


ACHIEVEMENT AND HEALTH 243 


6. If achievement is measured by depart- 
mental test averages alone, then the health 
of the individual at the time of the tests is 
one of the most important factors condition- 
ing his achievement. There is a_ possible 
neurological explanation for such a relation- 
ship. 

7. Health status (as measured by the 
comprehensive physical examination) and 
health function (as measured by the weekly 
health examination) are as consistently high 
in predicting achievement as any of the 
factors included in this study. 


8. If intelligence (as measured by a 
standard intelligence test), general high 
school average, high school English average, 
and reading comprehension are considered as 
a Single measure, since they include so many 
common elements, then the second most 
important factor in predicting achievement 
in English is health. 


g. A pre-test, such as the Purdue Test or 
the Cooperative English Test, is of definite 
value in grouping students according to their 
probable achievement in freshmen English in 
order to eliminate the teaching waste usually 
associated with a wide spread of abilities. 


10. If achievement is to be measured in 
terms of individual improvement, then the 
percent of gain made in terms of possible 
gain is a reliable measure. On the other hand, 
the raw score difference between a pre-test 
and a final test is not a good measure of 
improvement because of the inconsistency of 
the gap occurring in the distribution curves. 


11. When the twenty measures employed 
in thé present study are considered in rank 
order as they affect achievement, the two 
health measures rank above all with the 
exception of the conventionally accepted 
measures of intelligence, high school average, 
and aptitude. 


12. The older students, in this study, tend 
to make poorer final grades on the course, 
possibly because of retardation in high school 
or delayed college entrance, but they tend 
to make greater improvement because of low 
initial ability and greater application. 


As a result of this study, the following 
recommendations are ventured: 


1. Because of the apparent health factor 
in scholastic achievement, colleges and uni- 





oe ee 











244 


versities should use their health service de- 
partments for educational purposes by making 
health ratings of individual students available 
to members of the faculty. 

2. In studies where a measure of health 
is to be employed, the comprehensive physical 
examination or a health case study should 
be used instead of the indices used so often 
in the past, such as height, weight, etc. 

3. In studies conditioning achievement, the 
measures should be grouped into categories 
because of the many common _ elements 


present in a group of isolated measures. 


JOURNAL OF EXPERIMENTAL EDUCATION 





[Vol. 7, No. 3 


4. Medically trained individuals should 
attempt to discover methods whereby individ- 
uals can be rated objectively on comprehen- 
sive examinations. 


5. In order to determine more valid con- 
clusions, studies similar to this should be 
made by other investigators with different 
groups of students living and working under 
conditions different from those in this in- 
vestigation. With larger samples there would 
be a sufficiently high number of cases in each 
health category to study the characteristics 
associated with it. 























A TEST OF THE ASSUMPTIONS OF LINEARITY AND 
HOMOSCEDASTICITY MADE IN ESTIMATING THE 
CORRELATION IN ONE RANGE FROM THAT 
OBTAINED IN A DIFFERENT RANGE 


TEOBALDO CASANOVA 
Department of Education of Puerto Rico, San Juan, P. R. 


If the correlation of two tests within the 
range of one grade is known while one of the 
standard deviations for the range of several 
grades is available, the correlation in the 
larger range is usually estimated through the 
following formula: 

oy? (I — Po") = Sy? (1 — Rey”) (1) 


where r, is the correlation obtained in the 
small range, R.,,, the correlation estimated 
for the large range and from the y— variable, 
a, the standard deviation in the small range, 
and X, the standard deviation in the large 
range. 

This formula was derived by Kelley’ 
assuming that the correlation in the small 
range is the result of the curtailment of the 
distribution of the x— variable in the scatter 
diagram for the large range, that such cur- 
tailment affects the y— variable only in a 
consequential manner, and that the y— ar- 
rays are homoscedastic and show rectilinear 
regression in the scatter diagram for the large 
range. Then the slope of the line through the 
means of the y— arrays is not changed by the 
curtailment and the regression coefficients for 
both ranges are equal. Hence, 


o,* 2__ > . > 
92% = ye Rew (2) 

Dividing (2) by (1), 
7, Ray* (3) 








o,2(1 — ro") ei = 2(1 — Rey)*) 


This formula is useful in estimating R,,,, 
when %,, the standard deviation of the vari- 
able whose distribution is assumed to be cur- 
tailed, is available. It was originally derived 
by Pearson? in a different manner, and given 
later by Kelley® in the present form. 


1 Kelley, T. L., Statistical Method, p. 224. New York: The 
McMillan Co, 1923. 
2 Pearson, K., On the Influence of Natural Selection on the 
Variability and Correlation = = Phil. Trans. Roy. Soc 
of London, A, Vol. +. ty 902. 
* Keliey, T. L., Op. hs Dp. 28. 


If the distribution of the y— variable is 
assumed to be curtailed while that of the x— 
variable is assumed to be rectilinear and 
homoscedastic, the equations corresponding 
to (1), (2) and (3) are: 





ox? (1 —7o"7) = 3y7(1 — Rex’) (1a) 
ee 2 » 2 j 
oy T> — y7 Rew (2a) 
f." R..,)7 
(3a) 





oy*(1 —Tr,") a Sy7(1 — R,,,,*) 


where the notation is the same as before, but 
in terms of the x— variable. 


Text books in educational statistics do not 
agree as to which of the four equations (1), 
(1a), (3) or (3a) is to be used in estimating 
the correlation in a wide range from that ob- 
tained in a narrow range. Garrett* only gives 
(1) and (1a) while Holzinger® omits these 
and recommends the use of (3) and (3a). 
The common practice is to use (1) when only 
x, is known and (1a) when only &, is avail- 
able. (3) and (3a) have not been so widely 
used due, perhaps, to the simplicity of the 
former two. But it is evident that unless 





7 
-, -~y\ 
f 


Oy 


(1) and (1a) will yield different results, and 
that the same is true about (3) and (3a). 
Moreover, if the y— variable is not strictly 
rectilinear throughout both ranges, the values 
obtained from (1) and (3) will differ; and 
unless the x— variable is strictly rectilinear 
throughout both ranges, (1a) and (3a) will 
yield varying results. Therefore, in the case 
that both 3, and &, are known and that the 
two foregoing conditions are not fulfilled by 
the distributions, there will be four different 


* Garrett, H. E., Statistics in Psychology and Education, 
p. 304, New York: Longmans Green and , 1937. 

* Holzinger, K. , Statistical Methods for Students of 
Education, p. 172. New York: Ginn and Co., 1928. 


245 


ee ee 





240 


solutions to one and the same problem. In 
the absence of a method providing for a 
unique solution, the establishment of criteria 
for the selection of the variable whose dis- 
tribution is assumed to be rectilinear and 
homoscedastic is necessary. Furthermore, it 
is still possible that neither the x— distribu- 
tion nor the y— distribution is truly linear 
and homoscedastic in the sense that the value 
of R, shall lie within the error of random 
sampling. The purpose of this article is then, 
to propose a test of the assumptions of recti- 
linearity and homoscedasticity throughout 
both ranges that will enable one to choose the 
most suitable variable on which a fair esti- 
mate of R, may be based, or to conclude that 
none of the variables may provide a solution 
within the error of random sampling. 

Imagine a scatter diagram for a large range 
in which the distribution of the y— variable 
is strictly rectilinear and homoscedastic. Let 
ry, be the correlation within the small 
range, and R.,,, be the correlation for the 
large range. Equation (1) may then assume 
the following form: 














3 BS 
— — Ray)? 5 2 I—TSay) 
Ty oo,” 
But by (2), 
> i rH, 
R vy) = Tey)” 2 
ao,” — AS 
Whence, 
SS 3.2 
x, 2 2x 
—— Foy) >; =I Toy) 
ow ee 
and, 
/3y 
v——1 
C,~ 
Toiyy = 
NS 
-—y 
VY ~-—1 





For the sake of simplicity, let 
» # 2 s 2 
—~ — P,?, and — — P.? 
Ge” e. 
then, 
epee 5 
VPZ—1 
If instead, the distribution of the x— vari- 
able is the one assumed to be rectilinear and 
homoscedastic, the corresponding equation is 


_ VE 3 


e(x) ™ VP—1 


(4) 


(4a) 


JOURNAL OF EXPERIMENTAL EDUCATION 





[ Vol. 7 No. 3 


Ty) is a function of the four standard 
deviations and it imposes the necessary con- 
ditions for the exact estimate of R.,,, under 
the assumptions of rectilinearity and homos- 
cedasticity in the distribution of the y— vari- 
able throughout both ranges. The same asser- 
tion holds for r..., in respect to the x- 
variable. 


If the correlation in the large range is 
known, say the correlation of two tests within 
the range of several grades, the correlation for 
the range of one grade may be determined 
from equations (1) or (3). This would be a 
correction for heterogeneity of the population. 
Let y be the variable whose distribution is 
rectilinear and homoscedastic. Substituting 
the value of r.,y, given by (4) for 7, in (2) 
and changing there R,,,, for R.,,), 


P,VP,? —1 
Py VPZ—1 


When the estimate is based on the x— vari- 
able, the corresponding equation is 


Ray) = (5) 





eee at 
P, VP,;?’—1 


As in (4) and (4a), (5) and (5a) impose 
the necessary conditions for the correct calcu- 
lation of ry) and f.;x) respectively from 
R.,,») and R.,x) under the stated assumptions. 


If the four standard deviations are avail- 
able, the values of r.,,, and 7.;x) may be com- 
pared to that of r,, the correlation obtained 
for the small range. If ro = fecx), Recs) May 
be obtained through (1a). If 7. = recy), Revs 
may be estimated from (1). If 7. ferx) and 
To #Very), the significance of the difference 
must be determined in each case. 


The significance of a difference may be 
deduced from the critical ratio, that is, from 
the ratio of the difference to the standard 
error of the difference. If d stands for 
locyy —Yo, Oa, the standard error of the 
difference is 





v4 =Vo;" e(y) v a", — 2F rey) FOF rey) Fo 
If r.,y, and 7, are uncorrelated,® 


2 2 
C4 ==\/o; e(y) + Or 0 


* This assumption is made in order to obtain an approxi- 
mation to the value of oy. The correct value can only be 
obtained by taking this correlation into account and evaluating 
it. This, have been unable to do, but I think that the 
approximation offered is better than nothing; and I hope that 
some mathematician will take the time to derive the exact 
value of this standard error. 


March, 1939] LINEARITY AND HOMOSCEDASTICITY 


and the critical ratio, CR, is, Squaring, summing, and dividing by 
number of samples, 
(6) 2 2 2 or 


Tg, aa 


On OC 
v@y Ly @y 
oo,” + 3 Oy 


o,, may be easily calculated from the 
approximate formula, If S, and oy are uncorrelated, 


© acm T.* ‘ - Owes : Og, - 
i, ee (7) or,? p(3 4-2¢ -) (9) 


3 ee 
Vn a,* oy” 


which involves the assumptions of rectiline- By well known formulas which assume 
arity, homoscedasticity, and mesokurtosis. mesokurtic distributions, 

n is the number of ‘cases in the small range. ; 

To obtain the CR, the value of r.,)is needed. cy, = == (10) 
For convenience equation (4) is reproduced . 


below. 
(10a) 


/P,? — I 
Tr. (y) Ss ( 4) 
eu es 
‘ (rob) 
Taking logarithmic differentials, 
I I ‘ 
log Yecy) == — log (P,* — 1)— — log (P,?— 1) = = (10c) 


Tg, 


and, . , 
where V — number of cases in the large range 


Arey) _ Ps dP, __—~P,dP. and m = number of cases in the small range. 
Tec) 4P,7—1) (P?2—1) Substituting in (9) and (10) and (10a) 





Squaring, summing, and dividing by the a P( re 2 (rr) 


number of samples, 2N 2n 


2 2 p 2 2 
Cr e(y) P, oP, P, oP, 2P, P, Tp. py Tp, Opy 
i... y TP.Ps 


ee F (P.2—1)?  (P2—1) (P,? —1) (8) 








But Likewise, 


s. 
P,=——= oni, = Pol % + : ) (11a) 


Oy 2n 


Taking logarithmic differentials, 


Now rp, py is the only unknown expression 
in equation (8). Its equivalent may be found 
as follows: 





_ = log 3, — log a, + log o,— log &, 





d 3, do, , da, d>. 
a Os o = 


~y, 


Squaring, summing and dividing by the number of samples, 


2 2 2 2 ,2 , 2 
x 7 To, ' oo, o>. my Oy ~y ‘ 273, o,7Z,7% oz 
= = - oe - —_—_——_——_ ma — 

° F . 8 > > : 9° : ’ " 
P,? ‘ J a » : ay” ee > “ y Fs = 


ay 0, 
> > > 
21 gy 2:7 ¢; ap 


op’y Op 


21 gy ox Tas og 





Oy G, o,> 














* 


48 


By another well known formula which in- 
volves the usual assumptions, 


7S. z, | A, (13) 

and 
tan or (13a) 
R, is the correlation that would be ob- 


tained in the large range. 
Equation (12) may be greatly simplified 
by substituting in it the values of the stand- 














ard errors given by (10), 
(roc), the values of op, 


(10a), (10b) and 
and opy obtained 





JOURNAL OF EXPERIMENTAL EDUCATION 


ati (p 


(Vol. 7, N 


The right-hand member of (15) is approx 
imately equal to 


: . f F,* 
2V ' on P,? —1 








r. 


This value is smaller than the right-hand 
member of (15), but the approximation is 
generally close enough for all practical pur- 
poses if V and » are large. Then, to this 
degree of approximation, 


) 


Substituting in (6) this value and that 
given by (7) for o, 


I 


. paw 


: (10) 





I 


>= Fe 





CR 


(17) 


I 


(1 —r,*)* 





[reer 


through (11) and (11a), and those of r g, g, 
and ry, y, given by (13) and (13a). Doing 


so, while assuming that 7 ys. 
and r y, ,, are all equal to zero, 


Opy (++ 





Cy? r Sy Cx? a - ae Cx 


—2lp. py Op 


P. P, 





2rp.pyOp, Opy = P, i 


As R, is not known, R.,,,) may be put in 
its place without appreciably affecting the 
final result. Substituting in (8) from (11), 
(11a) and (14), 


=— 3 
a | 


2 
| 


When the correlation in the large range is 
known and that within the small range is to 
be estimated, if the y— variable is assumed 
to be rectilinear and homoscedastic, the stand- 
ard error of R.,,) is needed in order to obtain 


ig 


R.? 
N 


rr,” 
n 


(14) 


— 








an equation corresponding to (17) This de- 
rivation is not given here as it is too long and 
too similar to that just given for the standard 
error of r.;y). Making the same assumptions 
as before, this is, 




















2 2 
ae ats +a )ps pps (A 4% 
tes 2\ 2% N n (15) 
¥ (P,? — 1)? (P,? — 1)? or (P,? — 1) (P,? — 1) ‘ 
I 
—— Reon'(Sxt aM arapt i 75) 
R.? Fecy) I " 
—( N + u oar) ey (15a) 











March, 1939) 


If N and n 
formula gives an approximate value for the 


standard error of Re:y), 





If CR.,, is the critical ratio in this case, 





LINEARITY AND HOMOSCEDASTICITY 249 




















are large, the following By (17), 
CR-=- .8000 — .5331 er 
/ .06877° + (1 — .80*)* 
' 100 
C ey) = Revs) : - . —- : 62 
iota ”Y ON (a pin) a 
x Rey, — Ro 
CRw) = Sa (17a) 
ik x I I I > (1—R,?*)? [2 
Rey)? ay lar 2 P, ‘ ens cae 
; 2N 2n Pe—1 Pe—t N 
was 


Equations (16a) and (17a) are equal to 
equations (16) and (17) respectively, except 
for the fact that the correlations in the large 
range have taken the place of the correlations 
in the small range. 

For a practical illustration, let P, = 1.200, 


P. =— 1.600, 7, = .8000, m = 100, and 
N = 400. 
By (4), 
V 1.2007 — 1 
Toy) = = .§33) 





\V 1.6007 — 1 
By (16), 


By formula (15) the value of o,,,,, 
found to be .1076 instead of .06877 as given 
by (16), and that of CR was found to be 
2.35 instead of 3.44. This will give an idea 
of the accuracy of formula (16). In either 
case the difference may be considered sig- 
nificant, as data yielding a critical ratio 
greater than 2.00, should not be accepted as 
fulfilling the conditions of rectilinearity and 
homoscedasticity for the purpose described 
here. 








200 


fj 1 I ( 
tety) == 533! deepen 
Pune See" 800 


I1.200° — I 


I I 68 
a — 0.687 
1.6007 — 1 7 








